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For PATRICIA ANN, 
who contradicts the title because she 
has always been 
“the most” and is 


still the best 


When a quantity is the greatest or the 
least that it can be, at that moment it neither 
flows backwards nor forwards; for if it flows 
forwards or increases it was less, and will 
presently be greater than it is; and on the 
contrary if it flows backwards or decreases, 
then it was greater, and will presently be less 
than it is. 
— Isaac Newton on maximums and 
minimums, in Methodus fluxionum et 
serierum infinitarum, 1671 


There are hardly any speculations in 

geometry more useful or more entertaining 

than those which relate to maxima and 

minima. 

— the great English mathematician Colin 
Maclaurin, in A Treatise of Fluxions, 1742 


The great body of physical science, a 
great deal of the essential fact of financial 
science, and endless social and political 
problems are only accessible and only 
thinkable to those who have had a sound 
training in mathematical analysis, and the 
time may not be very remote when it will be 
understood that for complete initiation as an 
efficient citizen of one of the great complex 
world-wide States that are now developing, it 
is as necessary to be able to compute, to think 
in averages and maxima and minima, as it is 
now to be able to read and write. 

— H. G. Wells, from Mankind in the Making, 1903 
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Preface to the Paperback Edition 


All the greatest mathematicians have long 
since recognized that the method presented 
in this book is not only extremely useful in 
analysis, but that it also contributes greatly to 
the solution of physical problems. For since 
the fabric of the universe is most perfect, and 
is the work of a most wise Creator, nothing 
whatsoever takes place in the universe in 
which some relation of maximum and 
minimum does not appear. 

—the great Swiss-born mathematician Leonhard 
Euler, in Methodus inveniendi lineas curvas 
maximi minimive proprietate gaudentes, sive 
solutio problematis isoperimetrici lattissimo 
sensu accepti, 1744* 


lL a letter dated October 11, 1709, the well-known English 
scientist Roger Cotes wrote to his even better-known friend Isaac 
Newton. Cotes, who was in charge of preparing the second edition 
of Newton’s monumental Principia for publication, had a gloomy 
message to deliver, stating “It is impossible to print the book without 
some faults.” Events proved him to be correct. After the appearance 
of the second edition, Newton sent Cotes a list of new corrections, 
which prompted Cotes to reply, in a letter dated December 22, 1713, 
“T observe you have put down 20 Errata... .I believe you will not 
be surprised if I tell you I can send you 20 more.” Cotes then went 
on to reveal that while he was preparing the second edition he had 


* In English, A Method for Finding Curved Lines Enjoying Properties of Maximum 
or Minimum, or Solution of Isoperimetric Problems in the Broadest Accepted Sense. 
The quotation is taken from the opening paragraph of the book’s appendix, which 
is titled “De curvis elasticis.” You can find a complete, annotated English transla- 
tion of the appendix in W. A. Oldfather et al., “Leonhard Euler’s Elastic Curves,” 
Isis 20, no. 1 (November 1933): 72-160. 
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“made some hundreds [of additional corrections to the first edition] 
with which I never acquainted you.” 

Well, this book isn’t the Principia (and I’m no Newton), but it 
does share one characteristic with that genius’s masterpiece—the 
first editions of both books had some errors! Not quite so many in 
this book as Cotes mentioned, I think, but a few. The appearance 
of the paperback edition of When Least Is Best has given me the 
opportunity to make those missteps go away, and I gratefully thank 
Vickie Kearn, my longtime editor at Princeton University Press, for 
that opportunity. 

Besides typographical errors, there were two errors of citation 
omission that I would like to now correct. First, the discussion 
on pages 28 through 33 was motivated when I read the paper by 
Nathaniel Silver, “A Refraction Problem in Several Variables,” Amer- 
ican Mathematical Monthly, June-July 1987: 545-47. And second, the 
perfect basketball shot discussion on pages 158 through 165, al- 
though presented as a natural spin-off of Halley’s gunnery problem 
(for which I cited a 1997 paper by C. W. Groetsch) was actually dis- 
cussed sixteen years before the appearance of Professor Groetsch’s 
more general, historical paper, in an analysis by G. J. Porter, “New 
Angles on an Old Game,” American Mathematical Monthly, April 
1981: 285-86. 

In the discussion on pages 56 through 60, on Jacob Steiner’s 
flawed geometric proof of the isoperimetric theorem, I make ref- 
erence to Besicovitch’s solution to Kakeya’s problem. I make only 
some brief, general comments on what Besicovitch proved, but you 
can find much more in two papers: “On a Theorem of Besicovitch” 
by Hans Rademacher, and “On the Besicovitch-Perron Solution to 
the Kakeya Problem,” both of which are in Studies in Mathematical 
Analysis and Related Topics: Essays in Honor of George Polya, edited 
by Gabor Szeg6 et al. (Stanford University Press, 1962). In the sec- 
ond paper, the “Perron” is the German mathematician Oskar Per- 
ron (1880-1975), who in 1913 formulated an amusing “proof” to 
illustrate the flaw in Steiner’s isoperimetric proof. As I discuss in the 
text, Steiner made his error right at the start, with his assumption 
that there actually is a closed curve of given length that encloses the 
maximum area. Assuming that an extrema question actually has an 
answer can lead one astray, however, as it does in dramatic fashion 
in Perron’s paradox, which is a “proof” that 1 is the largest integer! 
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Here’s how it goes: start by assuming that there is in fact an N > 1 
that is the largest integer. Then, N* is an integer and, of course, 
N* > N, which is in conflict with the assumption that N is the 
largest integer. Therefore our starting assumption that N > 1 must 
be wrong and so it must be true that N = 1. 

Now I'd be willing to bet that all readers of this book know that 
the proper concluding statement should actually be that the starting 
assumption that there actually is a largest integer is wrong, i.e., that 
the assumption that we can actually determine the largest integer is 
wrong. This is because we know how integers “work’—there is no 
largest one because there is always a bigger one, no matter how big 
the one we think of is. Just add one! And that’s the whole point to 
Perron’s paradox, of course; in those problems where we really don’t 
know 4 priori how things “work,” the assumption of the existence of 
a solution might well lead us into disaster. 

On page 259 there is a challenge problem for you to consider, 
based on the isoperimetric theorem, the proof of which has just 
been completed on the preceding pages. As I explain there, I don’t 
know how to solve that challenge problem, and in the first edition 
I asked readers to send me a solution if they had success. You can 
read the details of the challenge problem on page 259, but for now 
let me just say that the problem is that of finding a derivation of the 
(claimed) inequality 


20 


| \ a? sin?(t) + b2 cos?(t)dt > 4 {xab + (a — b)’}, 


0 


where a and b are non-negative (but otherwise arbitrary) constants. 
This inequality is arrived at in this book by purely geometric argu- 
ments—the challenge (for you) is to find an analytical derivation. 

I received just three letters. The first, from a reader in Pennsyl- 
vania, Claimed to have a proof. But it was simply a demonstration 
that if an ellipse and a circle have the same perimeter, then the area 
of the ellipse is no greater than that of the circle. It was a clever 
bit of analysis, but of course, while true, it is just a special case of 
the isoperimetric theorem, which is proven in the book just before 
I state the challenge problem. The second letter (whose author did 
properly understand what was to be shown) was from the other end 
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of the spectrum; it was from a physicist in Scotland who asserted 
that the claimed inequality is “false and [so] there is no possible 
derivation for it.” He believed that my reasoning in arriving at the 
inequality contained “a deep flaw,” and that I had been led astray 
by “one of those rare situations where the crazy world of topology 
intrudes into the ‘real’ world in a ‘visible’ way.” Since all I do in the 
book is cut an ellipse into four good-sized parts and then rearrange 
them, I found his assertion to be just a bit hard to accept. Now, one 
might counter my reaction by observing that a simple half-twist to 
a long strip of paper, followed by joining the two distant ends of 
the strip, turns a two-sided object (the original strip) into a loop 
with a single side (the famous Mobius band) and that certainly is 
a bizarre topological intrusion. Perhaps my shuffling of the ellipse 
pieces had done something equally weird. There was a lot of hand- 
written mathematical analysis included in the physicist’s letter to 
back up his words, and although it was clearly the work of an intel- 
ligent author, I was reluctant to devote what I was sure would be a 
time-consuming effort to wade through it. 

But, what if he was right? It wouldn't be the first time I had made a 
mistake! 

I decided to follow his suggestion that “perhaps the best check [of 
the inequality] would be to look at numerical values of the integral 
and compare with calculated values of 42 {zab + (a — b)*}.” He 
admitted that he had not done that. I, on the other hand, keep a hot- 
to-trot MATLAB application idling on my computer’s desktop 24/7. 
After all, one never knows when a number-crunching emergency 
might occur—and if there ever was such an emergency, this was it! 
It was duck soup to write the brief code to do what the physicist 
suggested, and here’s how I proceeded. 

The claimed inequality can be slightly altered as follows: 


20 
bf (2) sinteo 4+ cos?(t) dt > b./ 4x {n ey (5 = i) | 
0 


Or, writing x = a/b, where 0 < x < ov, the challenge problem is 
equivalent to analytically deriving the following inequality (valid 
for all non-negative x): 
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270 
[V2 sin*(t) + cos2(t) dt — / 4 {ax + (x — 1)2} = 0. 
0 


Before studying the “truth” of this inequality by computer, there are 
two special cases we can use to partially check the MATLAB coding. 
For x = 0 the claim becomes 


20 
| voor dt —~V4n > 0, 
0 


that is, 


20 
| |cos(t)| dt — 2./m > 0. 
0 


(Notice, carefully, that f /cos?(t) dt # f cos(t) dt.) Now, since 


cos(t) > Ofor0 <t < 1/2, our claim Becones 


m/2 
4 | cost dt —2./n >0 
0 
Ot, 
4{ sin(t)|9/? — 2/7 > 0 
Or, 


4—2/n = 0.4551 > 0, 


which is, of course, true (even easier is to just recall that 2 < 4). For 
x = 1 the claim becomes 


20 
| siete + cos?(t) dt — V4n?2 > 0 
0 
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or, 


an 
[ar-2n =o 
0 


OF, 
2x —2x =0>0, 


which is, of course, true. The results from our MATLAB code should 
be consistent with these two particular calculations. 

Figure P shows the left-hand side of the boxed inequality, as a 
function of x, over the interval 0 < x < 5. The plot agrees in 
particular with our two special cases above, and in general with the 
inequality, because the curve never dips below the x-axis. Because of 
this plot, I conclude it is safe to say that the challenge problem still 


1.0 
0.8 


0.6 


Left-hand side of the boxed inequality 


FIGURE P. Computer verification of the boxed inequality. 
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stands. And, I should tell you, two weeks after his first letter arrived, I 
got a second letter from the Scottish physicist telling me, in so many 
words, “oops”: he had found a fatal error in his original analysis and 
had graciously written to say “your result . . . stands unchallenged!” 

Now, as long as I’ve been discussing a challenge problem, let me 
give you a new one to consider as you read the paperback edition of 
the book. 


New Challenge Problem for the Paperback Edition 


Let a, b, and c be the lengths of the sides of any triangle. Show 
that 


abc > (a+b—c)(b+c-—a)(a+c-—b). 


Hint: You may find it helpful to recall the following result 
from high school plane geometry (it was known to Euclid 
and appears as a proposition in his Elements): the bisector 
lines of any triangle’s three interior vertex angles cross at a 


common point called the incenter (so named because that point 
is guaranteed to be inside the triangle). For now I'll let you 
see if you can prove this for yourself, too; but if you can’t, 
its (elementary) proof is included in appendix I, along with a 
derivation of the challenge inequality. 

For the specific cases of the 45°-45°-90° and the 30°-60°- 
90° triangles, you can verify by direct calculation that the 
challenge inequality reduces to the claims that 9 > 8 and 
4 > 3, respectively, which are both (of course!) true. 


I’ll conclude on the somber note that the mathematicians Leonid 
Khachiyan and George Dantzig, who appear in this book’s chapter 7 
discussion of linear programming, have both died since the publica- 
tion of the hardcover. Remarkably, their obituary notices appeared 
on the same day, next to each other, in the New York Times (May 23, 
2005, p. A17). Dantzig’s long, productive life ended at age ninety, 
while Khachiyan’s was cut tragically short by a heart attack at the 
young age of only fifty-two. 


Lee, New Hampshire 
January 24, 2007 


Pretace 


This is a history of mathematics book, but it is not simply 
a collection of biographical, prose essays on the lives of various 
mathematicians. There is a place for that sort of book (e.g., see E. T. 
Bell’s classic Men of Mathematics), but this isn’t one of them. What it 
is is the technical story of what many brilliant mathematicians have 
done in the subject of extrema over the last two dozen centuries. To 
be blunt, there is a lot of mathematics in this book. Stephen Hawk- 
ing’s famous line about how every equation cuts a book’s readership 
in half doesn’t apply here—that’s for coffee table books, ones more 
for displaying than for reading. This book is for readers with calluses 
on their fingers because they read with pencil and paper in hand! 
While I hope much of what you read here will be new and exciting 
to you, I do expect you to bring some intellectual background to 
the table. In general, what a science or engineering major learns in 
the first year of undergraduate calculus and physics is pretty much 
enough (I'll be specific in the next paragraph). Actually, as far as the 
physics goes, all you really need to remember is that force is a vector, 
and what potential and kinetic energy are. For the math, however, 
there is a list of things I am assuming that is just a bit longer. First of 
all, do you find the following question easy? If we assume x is real, 
then what is 
_ sin(x) 
lim = 


n->0O n 


2 


The answer is, of course, zero, since no matter what x may be the 
value of sin(x) is always in the interval —1 to +1, and so as n > oo 
the expression goes to zero. Now, when I ask even my second-year 
engineering students this, I almost always get the correct answer of 
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zero, and so they are astounded when | tell them the answer is really 
6! Then I write on the blackboard, without saying a word, 


sin(x) sil (x) 
if 


lim —— = lim = si(x) = 6. 


n—> OOo n n-—> OOo 

If they laugh at this astonishing “calculation” then I take that as a 
good sign—only a student who has reached a certain level of skill 
and knowledge would find the above to be so wrong as to be funny. 

More seriously, I am assuming that you have a good background 
in high school algebra, trigonometry, and geometry, as well as in 
the elementary integration techniques of freshman calculus. For 
example, Iam assuming that it will be unnecessary for me to explain 
the quadratic formula, or what it means to quote a trig identity, or 
what solid angles, hyperbolic functions, and factorials are (and that 
0! = 1, not O), or what it means to say a real number is irrational, 
or what a vector is, or what the Pythagorean theorem is, or what 
it means to prove something by induction, or how to differentiate 
and integrate “simple” functions. On this last point, I expect you to 


Know that not only is 
fire! 
XaAX == 
0 2 


but also, without actually doing the integrals, that we can write 


8 8 
| sin!’ (x + /x) dx = i sin! (y + Jy) dy. 
1 ! 


I will assume, finally, that the physical interpretation of an integral 
as the area under a curve is a familiar one. 

It might appear just a bit odd for me to assume you already know 
what a derivative is, since the evolution of that concept is a major 
part of chapter 4 in this book. But not to make that assumption 
is awkward; there will be places in the first three chapters where, 
to make a point, I’lIl want to compare a noncalculus calculation 
with one using differentiation. If you know what an integral is, then 
assuming you know what a derivative is seems (to me) to be logical. 

As a more specific (and more interesting) example of what I have 
in mind, I am assuming that the following little analysis, while per- 
haps astonishing in its conclusion, also is understandable. One of 
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Ficure I. Torricelli’s paradoxical funnel. 


the mathematical gems of seventeenth-century mathematics was 
the discovery of a surface of revolution that, even though infinite in 
extent, nevertheless bounds a finite volume. Prior to this discovery 
it was Commonly accepted that a surface extended to infinity would 
necessarily have to be of infinite size, i.e., enclose infinite volume. In 
contradiction to that common belief, in 1641 the Italian mathemat- 
ical physicist Evangelista Torricelli (1608-47) discovered that if the 
first quadrant branch of the hyperbola xy = 1 (with x > a) is rotated 
about the x-axis, as shown in Figure I with a = 1, then the result- 
ing surface (resembling an infinitely long horn, sometimes called 
Gabriel’s horn, after the Biblical story of the archangel who blew it 
before making an announcement) bounds finite volume. 

The demonstration of this result is technically quite simple 
(which is why I am assuming you will be able to follow the details). 
We first imagine that the volume is sliced up into arbitrarily many 
thin cylindrical disks, each with thickness Ax. The radius of a disk 
with its center at x is y = 1/x. Thus, the volume of that disk is, from 
solid geometry, approximately given by 
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3 We 
AV + ay Ax = = Ax. 
x 


As Ax — 0, we can replace AV and Ax with the differentials dV 
and dx, respectively; our approximation becomes exact, and so we 
have 


d 
Wai 


2 

x 
To find the total volume V we simply integrate from x = a to 
infinity, and so 


v=favenf S- 2 os 


Torricelli’s result was thought paradoxical in the years following 
its announcement—see Paolo Mancosu and Ezio Vailati, “Torricelli’s 
Infinitely Long Solid and Its Philosophical Reception in the Seven- 
teenth Century” (Isis, March 1991, pp. 50-70); in 1672 the English 
philosopher Thomas Hobbes declared that one would have to be 
crazy to believe Torricelli. (If Hobbes’ philosophical powers had been 
equal to his mathematical skills no one would remember him to- 
day.) Even today Torricelli’s calculation can provoke a great deal of 
discussion in freshman calculus classrooms. Consider, for example, 
the fact that the area A of the Torricelli surface of revolution is in- 
finite, a result easily confirmed by calculating the value of the area 


integral: 
= | y,/ I+ (2) dx 
i dx 


(You can find this general formula for the surface area of the rotated 
curve y = y(x) discussed in section 6.9). Thus, since y = 1/x, we 
have 


and so 


aa [le aien [Ss a> [pare [OS 


The inequality follows as I have replaced the numerator of the 
integral with an expression that is, for every x in the interval of 
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integration, smaller than the exact numerator, i.e., x4 <x++1. But 
the last integral diverges logarithmically and so A is infinite. This 
would appear to mean that we could not paint the interior of Tor- 
ricelli’s surface, as we would require an infinite amount of paint to 
cover an infinite area. And yet we could paint the interior by simply 
filling the finite volume with paint. We seem to have a paradox. 

Well, even if you are now mumbling to yourself over whether 
or not we can paint Torricelli’s surface, if the mathematics itself is 
understandable to you then you know all of the mathematics you 
need to know to start reading this book. (Can you see how to resolve 
this paradox? The answer is at the end of this preface, but don’t look 
until you’ve thought about it for a while. Hint: there is a difference 
between real paint and mathematical paint!) 

Finally, to conclude this little essay, to read this book for the 
maximum gain you should have something that professors like to 
call “mathematical maturity.” This is an attribute intentionally left 
fuzzy, meant only to describe a “mind ready to receive” new material 
(or perhaps old material in a new way). My quick test for mathemat- 
ical maturity is to see how a student reacts to the following little 
gem of reasoning. 

Recall that all real numbers can be separated into one of two 
sets—the rationals (expressible as m/n, the ratio of two integers), 
and the irrationals (all the remaining reals which are, of course, not 
rational). Every real number is one or the other. An example of each 
is 0.3333--- = 7 and /2, respectively. In some sense, then, the 
irrationals have a “more complicated” structure than the rationals. 
Now, here’s the problem: if we raise an irrational number to an 
irrational power, can the result be rational? Most students come 
down on the side of no, arguing that combining two irrationals 
through the power operation is “messy” and seems incapable of 
producing something simple like m/n. But then I show them it is 
possible, and watch how they respond. 

Start by considering (/2)¥2, an irrational number raised to an 
irrational power. It is itself a real number and so it is either rational or 
it is irrational (we do not have to actually calculate it). If it is rational, 
we are done. If it is irrational, then raise it to an irrational power, 
e.g., consider ((/2)¥2)¥2. But this is (/2)? = 2, which is rational 
and so, again, we are done. Notice that we still don’t know from 
this argument if (./2)¥? is rational or not—and it doesn’t matter! (It 


XXvi PREFACE 


has been known since 1930 that (./2)¥? is not only irrational, but 
transcendental.) 

To judge mathematical maturity, I look for two things. First, of 
course, is simply that there is a technical understanding of the ar- 
gument. But I also want to see excitement, a “Wow, what a neat 
proof!” reaction. For me, that’s the signature of a mind “ready to 
receive.” I hope that was your reaction to the above, and that this 
book will give you lots more (indeed, a maximum) of “Wow, that’s 
neat!” moments. 


Painting Torricelli’s Funnel 


The reason for the “paradox” is that you are simultaneously 
holding two contradictory ideas about the nature of paint. 
Real paint has a molecular structure, i.e., there is a smallest 
(nonzero) volume of real paint, while mathematical paint is 
infinitely divisible. Consistently using either one of these two 
conceptions of paint removes the paradox. Here’s how. 

For mathematical paint: We can indeed paint the funnel’s 
inner surface by simply filling the funnel with a finite volume 
of paint. It does not follow, however, that it takes an infinite 
volume of mathematical paint to cover an infinite surface area, 
since the thickness of mathematical paint is zero. That is, infi- 
nite area times zero thickness is an indeterminate volume. The 
paradox has vanished. 

For real paint: It would, indeed, require an infinite volume 
of real paint to paint the outer surface of the funnel because 
real paint has a nonzero thickness. But it is impossible to paint 
the entire inner surface (equal in area, of course, to the outer 
surface area) because the paint won’t fit! At some point along 
the ever narrowing funnel, the opening will be smaller than 
a single molecule of real paint. This means we simply cannot 
compare the two different ways of painting the funnel since 
filling the funnel with real paint cannot even be done. 

We have a “paradox” only if we imagine filling the funnel 
with mathematical paint but painting the outer surface with 
real paint. Getting a paradox by changing the rules in “mid- 
game” is no Surprise at all. 


When 
LEAST 


is Best 


1. 


Minimums, Maximums, 
Derivatives, and 


Computers 


1.1 Introduction 


This book has been written from the practical point of view of the 
engineer, and so you'll see few rigorous proofs on any of the pages 
that follow. As important as such proofs are in modern mathematics, 
I make no claims for rigor in this book (plausibility and/or direct 
computation are the themes here), and if absolute rigor is what you 
are after, well, you have the wrong book. Sorry! 

Why, you may ask, are engineers interested in minimums? That 
question could be given a very long answer, but instead I’ll limit 
myself to just two illustrations (one serious and one not, perhaps, 
quite as serious). Consider first the problem of how to construct a 
gadget that has a fairly short operational lifetime and which, during 
that lifetime, must perform flawlessly. Short lifetime and low failure 
probability are, as is often the case in engineering problems, po- 
tentially conflicting specifications: the first suggests using low-cost 
material(s) since the gadget doesn’t last very long, but using cheap 
construction may result in an unacceptable failure rate. (An exam- 
ple from everyday life is the ordinary plastic trash bag—how thick 
should it be? The bag is soon thrown away, but we definitely will be 
unhappy if it fails too soon!) The trash bag engineer needs to calcu- 
late the minimum thickness that still gives acceptable performance. 
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For my second example, let me take you back to May 1961, to the 
morning the astronaut Alan Shepard lay on his back atop the rocket 
that would make him America’s first man in space. He was very brave 
to be there, as previous unmanned launches of the same type of 
rocket had shown a disturbing tendency to explode into stupendous 
fireballs. When asked what he had been thinking just before blastoff, 
he replied “I was thinking that the whole damn thing had been built 
by the lowest bidder.” 

This book is a math history book, and the history of minimums 
starts centuries before the time of Christ. So, soon, I will be starting 
at the beginning of our story, thousands of years in the past. But 
before we climb into our time machine and travel back to those 
ancient days, there are a few modern technical issues I want to 
address first. 

First, to write a book on minimums might seem to be a bit nar- 
row; why not include maximums, too? Why not write a history of 
extremas, instead? Well, of course minimums and maximums are 
indeed certainly intimately connected, since a maximum of y(x) is 
a minimum of —y(x). To be honest, the reason for the book’s ti- 
tle is simply that I couldn’t think of one I could use with extrema 
as catchy as is “When Least Is Best.” I did briefly toy with “When 
Extrema Are xxx” with the xxx replaced with exotic, exciting, and 
even (for a while, in a temporary fit of marketing madness that | 
hoped would attract Oprah’s attention), erotic. Or even “Minimums 
Are from Venus, Maximums Are from Mars.” But all of those (cer- 
tainly the last one) are dumb, and so it stayed “When Least Is Best.” 
There will be times, however, when I will discuss maximums, too. 
And now and then we’ll use a computer as well. 

For example, consider the problem of finding the maximum value 
of the rather benign-looking function 


y(x) = 3cos(4ax — 1.3) +5 cos(27x + 0.5). 


Some students answer too quickly and declare the maximum value 
is 8, believing that for some value of x the individual maximums of 
the two cosine terms will add. That is not the case, however, since 
it is equivalent to saying that there is some x = x such that 


4nx —1.3=27n 


2nx +0.5 = 27k, 
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where n and k are integers. That is, those students are assuming there 
is an x such that 


2nn+1.3 2k —0.5 


i= n and k integers. 


An On 
Thus, 
2nnx + 1.3 = 47k — 1, 
or 
2.3 = 40k — 20n = 20 (2k — n), 
or 


253 23 


| —— es 
2(2k —n) 20(2k —n) 


But if this is actually so, then as n and k are integers we would have 
x as the ratio of integers, i.e., 7 would be a rational number. Since 
1761, however, z has been known to be irrational and so there are 
no integers n and k. And that means there is no x such that y(x) = 8, 
and SO ymax(x) < 8. 

Well, then, what iS ymax(x)? Is it perhaps close to 8? You might try 
setting the derivative of y(x) to zero to find x, but that quickly leads 
to a mess. (Try it.) The best approach, I think, is to just numerically 
study y(x) and watch what it does. The result is that ymax(x) = 
5.7811, significantly less than 8. My point in showing you this is 
twofold. First, a computer is often quite useful in minimum studies 
(and we will use computers a lot in this book). Second, taking the 
derivative of something and setting it equal to zero is not always 
what you have to do when finding the extrema of a function. 

An amusing (and perhaps, for people who like to camp, even use- 
ful) example of this is provided by the following little puzzle. Imag- 
ine that you have been driving for a long time along a straight road 
that borders an immense, densely wooded area. It looks enticing, 
and so you park your car on the side of the road and hike into the 
woods for a mile along a straight line perpendicular to the road. The 
woods are very dense (you instantly lose sight of the road when you 
are just one step into the woods), and after a mile you are exhausted. 
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You call it a day and camp overnight. When you get up the next 
morning, however, you’ve completely lost your bearings and don’t 
know which direction to go to get back to your car. You could, if 
you panic, wander around in the woods indefinitely! But there is a 
way to travel that absolutely guarantees that you will arrive back at 
your car's precise location after walking a certain maximum distance 
(it might take even less). How do you walk out of the woods, and 
what is the maximum distance you would have to walk? The answer 
requires only simple geometry—if you are stumped the answer is at 
the end of this chapter. 


1.2 When Derivatives Don’t Work 


Here’s another example of a minimization problem for which cal- 
culus is not only not required, but in fact seems not to be able to 
solve. Suppose we have the real line before us (labeled as the x-axis), 
stretching from —oo to +oo. On this line there are marked n points, 
labeled in increasing value as x; < x2 <--- < x,. Let’s assume all the 
x; are finite (in particular x; and x,), and so the interval of the x-axis 
that contains all n points is finite in length. Now, somewhere (any- 
where) on the finite x-axis we mark one more point (let’s call it x). 
We wish to pick x so that the sum of the distances between x and all 
of the original points is minimized. That is, we wish to pick x so that 


So |e ty) |e xo eee ee xl 


is minimized. I’ve used absolute-value signs on each term to insure 
each distance is non-negative, independent of where x is, either to 
the left or to the right of a given x;. Those absolute-value signs may 
seem to badly complicate matters, but that’s not so. Here’s why. 

First, focus your attention on the two points that mark the ends 
of the interval, x; and x,. The sum of the distances between x and 
x,;, and between x and x,, is 


Ix — xi] + |x — x,| 


and this is at least |x; — x,|. If x > x,, or if x < x, (ie, if x is 
outside the interval), then strict inequality holds, but if x is anywhere 
inside the interval (i.e., x} < x < x,) then equality holds. Thus, the 
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minimum value of |x —x,|+|x—x,,| is achieved by placing x anywhere 
between x; and x,. 

Next, shift your attention to the two points x2 and x,_,. We can 
repeat the above argument, without modification, to conclude that 
the minimum value of |x — x2| + |x — x,_1| is achieved when x is 
anywhere between x2 and x,_,. Note that this automatically satisfies 
the condition for minimizing the value of |x — x;| + |x — x,|, i.e., 
placing x anywhere between x2 and x,_; minimizes |x — x,| + |x — 
X2| + |x — x,-1| + |x — x,|. You can now see that we can repeat this 
line of reasoning, over and over, to conclude 


Ix — x3] + |x —Xp-2| is minimized by placing x anywhere 
between x3 and x,_2, 


Ix — x4| + |x —Xp-3| is minimized by placing x anywhere 
between x4 and x,_3, 


and finally, if we suppose that n is an even number of points, then 


|x —x2|+ |x —x24;| is minimized by placing x anywhere 
between xx and x24}. 


So, we simultaneously satisfy all of these individual minimizations 
by placing x anywhere between x, /2 and x(,/2)41 (if n is even), and 
this of course minimizes S. 

But what if n is odd? Then the same reasoning as for even n still 
works, until the final step; then there is no second point to pair with 
X(n+1)/2- Thus, simply let x = x(,41)/2 and So |x — x(n41)/2| = 0, which 
is certainly the minimum value for a distance. Thus, we have the 
somewhat unexpected, noncalculus solution that, for n even, S is 
minimized by placing x anywhere in an interval, but for n odd there 
is just one, unique value for x (the middle x;) that minimizes S. 


1.3. Using Algebra to Find Minimums 


As another elementary but certainly not a trivial example of the 
claim that derivatives are not always what you want to calculate, 
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consider the fact that ancient mathematicians knew that of all rect- 
angles with a given perimeter it is the square that has the largest area. 
(This is a special result from a general class of maximum/minimum 
questions of great historical interest and practical value called iso- 
perimetric problems, and I'll have more to say about them in the next 
chapter.) Ask most modern students to show this and you will al- 
most surely get back something like the following. Define P to be 
the given perimeter of a rectangle, with x denoting one of the two 
side lengths. The other side length is then (P — 2x)/2, and so the 
area of the rectangle is 


(>) l 5 
A(x) =x 


ree eG, Rar, Se 
2 ys 

A(x) is maximized by setting dA/dx = $P —2x equal to zero, and so 
x= iP, which completes the proof. Using only algebra, however, 
an ancient mathematician could have argued that 


Ee ab peat PNT gk 
— xX —- = — ee Pe aor 
16 pe ae 16 4) = 16 


since (x — (P/4))? > 0 for all x. That is, A is never larger than the 
constant P?/16 and is equal to P”/16 if and only if (a useful phrase 
I will henceforth write as simply iff) x = P/4, which completes the 
ancient, noncalculus proof. 

As a final comment on this result, which again illustrates the 
intimate connection between minimum and maximum problems, 
we can restate matters as follows: of all rectangles with a given 
area, the square has the smallest perimeter. This is the so-called 
dual of our original problem and, indeed, all isoperimetric prob- 
lems come in such pairs. I’ll prove this particular dual in section 
1.5. Another useful isoperimetric result that seems much like the 
one just established—one also known to the precalculus, ancient 
mathematicians—is not so easy to prove: of all the triangles with 
the same area, the equilateral has the smallest perimeter. See if you 
can show this (or its dual) before I do it later in this chapter. 

We can use the previous result—of all rectangles with a fixed 
perimeter, the square has the maximum area—to solve without 


MINIMUMS AND MAXIMUMS 7 


calculus a somewhat more complicated appearing problem found 
in all calculus textbooks. Suppose we wish to enclose a rectangular 
plot of land with a fixed length of fencing, with the side of a barn 
forming one side of the enclosure. How should the fencing now be 
used? We could, of course, use calculus as follows: let x be the length 
of each of the two sides perpendicular to the barn wall, and @ — 2x 
be the length of the side parallel to the barn wall (€ is the fixed, total 
length of the fencing). Then the enclosed area is 


A=x(€ —2x) =x — 2x? 


and so 


= £—4 

dx " 
which, when set equal to zero, gives x = 4. Thus, €— 2x = 38, 
which says the enclosed area is maximized when it is twice as long 
as itis wide. But this solution is far more sophisticated than required. 
Simply imagine that we enclose another rectangular area on the 
other side of the barn wall. We already know that, together, the 
two rectangular plots should form a square, and so each of the two 
rectangular plots are half of the square, i.e., twice as long in one 
dimension as in the other. 

Our ancient mathematician’s trick of completing the square is a 
very old one, and some historians claim that it can be found implicit 
in Euclid’s Elements (Book 6, Proposition 27), circa 300 B.c. There, the 
problem discussed is equivalent to that of dividing a constant into 
two parts so that their product is maximum. So, if the constant is C, 
then the two parts are x and C — x, with the product 


M =x(C —x) =Cx —x* = —(x* — Cx) 


C2 CC? C\? © 
2 

== CS ae ea ee ie, 
(x eae =) (« ) 


Thus, as (x — (C/2))* > 0 for all x, then M is never larger than C?/4 
and is equal to C?/4 iff x = C/2. 

Stated this way, Euclid’s problem surely seems rather abstract, but 
in 1673 the Dutch mathematical physicist Christiaan Huygens gave 
a nice physical setting to the calculation. Suppose we have a line and 
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<——_ » ——_ >> 


FiGuRE 1.1. Huygen’s problem. 


two points (A and B) not on the line. Where should the point C be 
located on the line so that the sum of the squares of the distances 
from C to A and C to B, (AC)* + (BC)?, is maximum? With no loss 
in generality we can draw the geometry of this problem as shown 
in figure 1.1, with A on the y-axis. The figure shows A and B on 
the same side of the line, and places C between A and B, but as the 
analysis continues you’ll see that these assumptions in no way affect 
the result. 

In the notation of the figure we are to find the value of x that, 
with a, b, and c constants, maximizes d? + d?. Now, 


dj + dy ={x?+a°}+{(b-x)?+c7} 
=a’+b*+c*?+2x(x —d). 


Thus, we need to maximize the product x(x — b); but we already 
know from Euclid how to do that—set x = $b. That is, C is midway 
between A and B. If you redraw figure 1.1 so that either x > b or 
x < 0, and then write the expression for d? + d3, you'll see that the 
result is unchanged. 

An elementary example of an extremal problem in which there is 
(by the very nature of the problem) nothing to differentiate comes 
from discrete probability theory. Then the independent variable 
does not vary continuously but, rather, in discontinuous jumps. In 
such cases, taking a derivative simply has no meaning. So, suppose 
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we toss four fair die, i.e., each one of the six faces on each die has 
probability 7 of showing. What is the most likely number of die 
that will show a 3? The answer can only be one of five numbers, 
of course, the integers zero through four. If we define the value of 
the random variable X as the number of die that show a 3, then 
elementary probability theory tells us that P(X = k) = probability 
that X = k is given by 


raana(() 


where n is the number of die and (7) = n!/(k!(n—k)!). So, with n = 4, 


psi 625 
"1296 

500 

P(X =1)= —— 
1296 

150 

P(X =2) = —— 
1296 

P(X =3)= Zu 
— ~* 1296 

1 
X =4 = —. 
ie ) 1296 


Thus, the most likely number of 3’s to show is zero. But even more 
likely to happen is that at least one 3 shows, as 


Puke = PK =k) = aoe > P(X = 0). 
k=1 


This strikes many as a paradoxical result, but that is part of the 
inexhaustible charm of probability! 

1.4 A Civil Engineering Problem 

As amore sophisticated example of how minimization problems can 


sometimes be attacked with noncalculus approaches, consider the 
following. We have two towns, A and B, on opposite sides of a river 
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FIGURE 1.2. Minimum-distance bridge placement problem. 


with constant width w. As shown in figure 1.2, A is distance a from 
the river, B is distance b, and the lateral separation of the two towns 
is d. Our problem is to determine where we should build a bridge 
over the river (perpendicular to the river’s banks) so as to make the 
journey between A and B as short as possible. That is, what is x? 

With calculus, this question is not hard to answer. We simply 
write the total distance as 


T = Va?4+x7+w4+ Jb? + (d—x) 
and then set dT/dx = 0. Thus, 


dT | 2x 2(d — x) 
dx 2) Ja*+x?2  /b? + (d—x) 


MINIMUMS AND MAXIMUMS 11 


and setting this equal to zero gives 


_ ad 
— atb 


Ancient mathematicians could also have solved this problem, 
however, long before the invention of the calculus, using just el- 
ementary geometry. To see how, let me first make a fundamental, 
exceedingly important and useful mathematical observation called 
the triangle inequality. The triangle inequality asserts that, given any 
triangle, the sum of any two of its sides is at least as large as the 
third side. It is really just a statement of the fact that the shortest 
path connecting two points in a plane is the straight line passing 
through the two points. Thinking of the triangle’s sides as directed 
line segments | with both magnitude and direction (i.e., aS vectors), 
we can write u and v as two of the sides and u + v as the third side, 
as Shown in figure 1.3. 

The triangle inequality says that | u |+| v | > | u+tv |, where the 
absolute value signs denote the length of the vector. It is obvious 
that the inequality becomes an equality iff u and v point in the 
same direction (and so the triangle collapses to the “trivial triangle” 
with zero area). We can, in fact, now drop the imagery of the triangle 
itself, and simply think of u and v as any two vectors not necessarily 
associated with a triangle (although in many problems there will be 
a triangle). 

Now, redraw figure 1.2 as figure 1.4 and label the various path s seg- 
ments as vectors. Notice that no matter what x is, the sum (a + x x)+ 
(d a + b) is constant. Mathematically this is trivial (the two x's 


— re 
r 


FiGURE 1.3. Vector addition. 
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FiGuRE 1.4. Bridge geometry in vector notation. 


cancel), but physically this is because of the important observation 
that every vector sum (plus a constant w term to account for the 
bridge) starts at A and ends at B, no matter what e: may be. By 
the triangle inequality la +x | a ee +b > Jat+d+b |; an 
equality (which is the minimum sum) is achieved only when at+x 
and d — x + b are in the same direction. That is, when @ = a in the 
notation of figure 1.4. 

Since the two triangles in figure 1.4 are right triangles with their 
other two angles equal, they are similar triangles. Thus, dropping 
the vector notation, we have 


b 
d—x’ 


a 
Xx 


which is easily solved to give the location of the bridge at 


_ ad 
— atb’ 


just as before. But this time no derivative was required. And, in 
fact, our ancient mathematician’s solution actually provides some 
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immediate extra physical insight that the calculus one does not; 
since 6 = a, the path segments connecting each town to its respec- 
tive river bank are parallel. 


1.5 The AM-GM Inequality 


There are yet other methods the mathematicians of old, in the days 
before calculus, could have used to solve many problems that seem- 
ingly require the calculation of derivatives. One of the most elegant 
of these methods is what is called the AM-GM inequality (the arith- 
metic mean-geometric mean inequality). It is easy to state: 


If x1,x2,---,X, are any n positive numbers, n > 1, and 

if A = (1/n) (x) +x2 +---+-x,) is the arithmetic mean of the x’s 
and if G = (xjx2---Xx,)!/" is the geometric mean of the x’s, 
then A > G with equality iff xj = x2 =--- = Xp. 


New demonstrations of this famous and remarkably useful in- 
equality appear on a regular basis to this day, but one of the easiest 
to understand (as well as one of the most elegant) is the 1954 proof 
by a mathematician named G. Ehlers. I know nothing more about 
Ehlers, but his proof of the AM-GM inequality is a gem and you can 
find it in appendix A. That proof uses just simple algebra and induc- 
tion, but no calculus, which is appropriate since the whole point here 
is to show how we can solve many minimum/maximum problems 
without the techniques of calculus. 

For example, recall the isoperimetric dual problem mentioned at 
the start of section 1.3: show that of all rectangles with a given area 
it is the square that has the smallest perimeter. This is actually quite 
easy to demonstrate with the AM-GM inequality. If we call the sides 
of the rectangle x and y, then the problem is to determine x and y 
so that we minimize 


P=2x+2y=2(x+y), 
given that 
A=xy 


is a constant. From the AM-GM inequality with n = 2 we immedi- 
ately have 
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1 

s(x+y) > Vxy=VA 
with equality iff x = y. That is, 

P=2(x+y) > 4VA, 


which says P is never smaller than the constant 4./A and is equal to 
that constant iff x = y (iff the rectangle is a square). 

Closely related to this result is one concerning right triangles. 
Imagine all possible right triangles with perpendicular sides of 
lengths x and y that sum to a constant, i.e., 


x+y=k. 
If we write A to denote the areas of the triangles, then 


year: 
7 ies 


Now, the AM-GM inequality for n = 2 says 


x+y 
2 


> J/xy =V2A 


with equality iff x = y. Thus, 


OT 


with equality iff x = y. This shows that of all right triangles with 
perpendicular sides that sum to a constant, it is the isosceles right tri- 
angle that has the largest area (a result known since ancient times). 

For another elegant illustration of the power of the AM-GM in- 
equality, think back a bit to a question I asked you to ponder: of all 
triangles with a given area, show that it is the equilateral that has 
the smallest perimeter. Did you have any success doing that? It’s 
not trivial! I’ll do it here with the aid of the AM-GM inequality by 
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showing the dual theorem: of all triangles with a given perimeter P, 
the equilateral has the largest area. As a prelude, recall the amazing 
formula for the area A of any triangle in terms of the lengths of its 
sides (a, b, and c). This formula is named after the Egyptian mathe- 
matician Heron of Alexandria, who is thought to have lived in the 
first century A.D. Some historians have speculated that the formula 
was known by Archimedes three centuries earlier, but there is no 
real evidence of that (other than Archimedes’ genius, which makes 
it probable that he did know it), while the formula does appear in 
Heron’s Metrica. It is not an easy formula to derive [see William Dun- 
ham, Journey through Genius: The Great Theorems of Mathematics John 
Wiley 1990, pp. 118-27)], but it is easy to state: 


A = JVs(s —a)(s — b)(s —c), 


where s = +(a +b+c)= SP, the so-called semiperimeter of the tri- 
angle. Since P is given, then so is s and Heron’s formula tells us that 
to maximize A we must maximize the product (s — a)(s — b)(s — c). 

Notice first that each of the factors in that product is indeed 
positive, e.g., 


because from the triangle inequality for nontrivial triangles (trian- 
gles with nonzero area) we have b+ c > a. Now, from the AM-GM 
inequality, we have 


(s—a)+(s —b)+(s—c) _ 3s—(a+b+C) _ 3s — 2s 
3 7 3 3 


= 5 = Us —ay(s— bys — 0)" 


with equality iff (s — a) = (s — b) = (s — oc), ie., iffa = b = cc. The 
term s/3 is a constant upper-bound to the inequality and so the area 
is maximized if a = b = c, and that’s the entire proof! 

As a third example of the AM-GM inequality solving a problem 
ordinarily thought to require calculus, consider the following ques- 
tion that probably appears in every calculus textbook ever written. 
A food can (with both ends sealed, of course) with the given vol- 
ume V is to have the shape of a right circular cylinder. What are the 
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dimensions of the can (the radius r and the height h) so that the 
surface area is minimum? The “calculus way” to answer this is to 
write the surface area S and the volume as 

S =2nr* + 2nrh 

V=ar-h 


and then to eliminate h. Thus, h = V/xr’, and so 


2V 


V 
S = 2nr? + 21r—> — 2nr + —., 
Tr r 


We minimize S (as we'll see in chapter 4) by setting dS/dr to zero, 
1:€:, 


which gives the solution for r. Thus, V = 2zr°, or 


V 


ot ae 


That is, the height of the can with minimum surface area is equal 
to the diameter of the can. 


Here’s how the AM-GM inequality answers the same question. As 
before, 


V V V 
$= 2n (+ rh) =2n (2+) an (4 24), 
mr 


Or 
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5 y2 1/3 y2 1/3 
ae oe ee S>6 — : 
6m — (<3) on See (<3) 


21/3 
Thus, the surface area is never less than the constant 6x (¥)'°, and 
is equal to that minimum value when r= = = , i.e., when 


V = 2zr° just as we found before (but before we had to know how 
to calculate a derivative). 

Now, here’s a little variation for you to play with: in the example 
just done, both ends of the can were sealed. Suppose instead that 
only the bottom end is sealed. For the same volume as before, what 
now is the relationship between r and A to minimize the surface 
area, and what is the ratio of the new minimized surface area to the 
one just calculated? It should be obvious that the ratio is less than 
one, but how much less than one? Remember, no calculus! There are 
two ways for you to attack this problem. You can start over and use 
the AM-GM inequality, of course. More clever, however, is to use our 
previous result, by noticing that if we take two cans, each with only 
one end sealed, and butt the unsealed ends together, we get a can 
with both ends sealed! Either way, you should get the same answers. 
(The answers are at the end of this section.) 

We can use the AM-GM inequality to prove the following curious, 
and I think unobvious, fact: given two food cans of equal volume 
and equal height, one cylindrical and the other rectangular in shape, 
the cylindrical can will always have the smaller total surface area. To 
see this, observe that if V is the common volume, then, for either 
shape, we can write 


V = (area of bottom) x (height). 


So, since the heights are also equal, then the areas of the bottoms 
(and tops) of the two shapes are equal, too. Thus, to decide which 
can shape has the smaller total surface area we need only to com- 
pare the vertical surface areas. To do that, let’s make the following 
definitions: 


Sc = vertical surface area of a cylindrical can of radius r and 
height h, i.e., S$, = 2arh, 


S, = vertical surface area of a rectangular can with dimensions 
axbxh,i.e., S, = 2ha+2hb = 2h(a +b). 
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This means 
S, — So = 2h(a +b) — 2arh = 2h{(at+ b) — ar). 
From the AM-GM inequality we have (a + b) > 2/ab, and so 


S, — S, > 2h |2Vab = rr | 


because I’ve replaced (a+ b) with a smaller quantity. Now, since the 
volumes of the two cans are equal we can also write 


V=arh= abh, 


V 
Vab = te 
h 
| V [V 
Tr=—T7 yi ae 
This gives us 


s.-s2mlalt -va[Z| —an/7 [a ve]>0 


because it is clear that 2 > ./z (i.e., 4 > 2). So, no matter how you 
choose the various dimensions of the two cans, if they have equal 
volume and equal height then the cylindrical can will always have 
the smaller total surface area. 

If we don’t require the two can shapes to have the same height, 
then it is no longer true that the cylindrical can will have the smaller 
surface area no matter what the dimensions may be. For example, 
suppose the rectangular can has dimensions 1 x 1 x z, for a volume 
of z. Its total surface area is then 2 + 4a = 14.57. If the cylindrical 
can has a radius of r and height h, then for the same volume we 
have mr*h = 7, or 


and so 


and 
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Its total surface area is 


l 
T =2nr* +2rrh = Anr + dur 
- 


l 
=n (+2), 
r 


It is clear that we could pickr to make T arbitrarily larger than 2+47. 

But it is also true that, if we pick r to give the minimum surface 
area for the cylindrical can, that area will be smaller than 2 + 47. 
That is, differentiating T gives 


dT 1 
2 Ome | Dp es 
dr r2 


which is zero when r = Oe . which gives 
2/3 +] 
1 1 4 1/3 3 
G) | (3) 
= 372)? = 11.87, 


nearly 19% less than the surface area of the rectangular can. 

As the final example of this section, let me show you how mathe- 
maticians of old could have solved yet another maximum problem. 
As shown in appendix B, using nothing but algebra (no calculus), 
a consequence of the AM-GM inequality is yet another inequality 
called the arithmetic mean-quadratic mean inequality (the AM-QM 
inequality): if x;, x2,---,x, aren numbers, then 


Xp XQ s+ Ny é xp txgt--- +x? 


ae 9 


n> 1 
n n 


with equality iff x; = x2 =--- = x,. But the AM-GM inequality itself 
tells us that 


Xp XQ + Xp 
n 


(x1xQ+++X,)'/" < 
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with equality iff x} = x. =--- = x,, and so 


xe tay t--- +x? 


n 


(x1X2°*+X_)'/” 


LA 


with equality iff xj = x2 =--- = Xp. 
This general result has a very pretty geometric interpretation for 
n = 2, i.e., for 


2 2 
Xj + X45 


J/X1xX2 5 5) 


Suppose that x? + x4 = R? (a constant). The equation x? + x3 = R* 
is a circle (centered on the origin of the x;, x2 coordinate system) 
with radius R, and so ,/x;x2 is bounded from above by the constant 
R//2. And since 4x;x2 is the area of a rectangle inscribed in that 
circle, then that area is bounded from above by the constant 2R* and 
that area is equal to 2R? iff x; = x2. That is, the inscribed rectangle 
of maximum area is the inscribed square. 


The answers to the problem of the cylindrical can with min- 
imum surface area, with just one end sealed, are 


a.r=h 


l 
b. ratio of surface areas = 5 V4 = 0.7937. 


1.6 Derivatives from Physics 


There are minimum/maximum problems of great interest that do 
contain derivatives, but not because we are going to set them equal 
to zero. They are present because, for example, the physics of the 
problem requires them. The actual determination of a minimum 
(Or a maximum) of something in such problems, however, depends 
on other sorts of arguments. So, for the penultimate section of this 
introductory chapter, let me take you through the details of one 
such problem that has derivatives aplenty because of the physics and 
not because of the mathematics. 
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FIGURE 1.5. Vertical cannon shot. 


Consider figure 1.5. There we have a cannon pointing straight up, 
directly away from the center of the earth (not drawn to scale!). If 
we fire the cannon a shell is ejected with initial velocity vo, it rises 
upward to some maximum height, stops, and then falls back down 
to the ground. It is clear that the larger vo, the higher the shell will 
go before gravity brings its upward motion to a halt. We can show, 
in fact, that if vo has a certain critical minimum value, then the 
shell will not return to earth. That minimum value for vo is called 
the escape velocity. 

If we measure distance from the center of the earth as r (r = 0 
is the center, and r = R is the surface of the earth), then Newton’s 
second law of motion (force equals mass times acceleration) and his 
inverse-square law of gravity tells us that if we ignore air-drag on the 
Shell, then 
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where: m = mass of the shell, 
M = mass of the earth, 
G = universal constant of gravitation. 


The minus sign on the right side of the differential equation is 
present because increasing r is directed upward, while the gravi- 
tational force on the shell is in the opposite direction, downward 
toward the center of the earth. 

We can solve this second-order differential equation with the help 
of a powerful result from differential calculus called the chain rule 
(discussed in chapter 4): if we write v(r) as the velocity of the shell 
at distance r from the center of the earth, then by definition 


dr 
eee 
dt 


and so the acceleration of the shell is 


d*r dv dr dv dv 


dt dt dt dr dr 
This reduces our original differential equation to the more tractable 
(with m canceled on both sides) equation 


OY 2 gs > R 
v— — a A 9 r e 
dr r2 a 


We can “separate the variables” in this equation and write 


d 
vdv=—GM ae 

; 
which is easily integrated to give 


I , 1 
—v°=GM -+C, 
2 r 


where C is the so-called “constant of indefinite integration.” Now, 
since v = vp When r = R, then 


1 
v9 =GM = +C, 


N| — 


OT 
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l 1 
C= -v—-GM —, 
and thus 
ire, cuit 2_cmi 
~v= =n = saat 
2 r 2 R 


If we define H as the shell’s maximum distance from the center of 
the earth, then, as by definition v = 0 when r = H, we have 


_GM 14 GM 


— Uv 
H 2°  R 


0 


9 


Or 


GM 
H=<G 


1 4 
R28 
If vy = O then H = R, which is simply the obvious; if the shell 
“leaves” the cannon with zero initial velocity, then it doesn’t go 
anywhere! But as vp increases from zero, then H increases from R 
and, obviously, as SUS approaches GM/R we see that H diverges to 
infinity, i.e., the shell does not return to earth. So, the minimum 
escape velocity is the initial velocity given by 


2GM 
= 4 
R 


Any velocity greater than this also means the shell isn’t coming 
back, of course. 

We can express this result in the following interesting alternative 
way. When r = R, the gravitational force on the shell is simply what 
we call its weight at the surface of the earth, which is mg, where g 
is the acceleration of gravity at the surface. Thus, 


and so GM = g R’. This gives the escape velocity as 


[22 R2 
vo = — = J/2gR. 
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Taking the earth’s radius as 3,950 miles, and g as 32.2 ft/sec”, we 
have the escape velocity as 


vo = V2 x 32.2 x 3,950 x 5,280 ft/sec 


= 36,649 ft/sec = 6.94 miles/sec. 


This is not the way we send people into space, of course, as 
the initial acceleration of the shell (spaceship) from zero to almost 
seven miles per second over the length of a cannon barrel would be 
unsurvivable. (But see Jules Verne’s From the Earth to the Moon. In 
his 1865 novel, he proposed getting around the problem of shoot- 
ing men to the moon using a fantastic 900-foot-long cannon. It 
wouldn’t work, but it is clever.) But, serious proposals have been 
made to put nonhuman payloads into orbit or on the moon, using 
super-high acceleration up to the escape velocity. Such accelerations 
would be achieved not with a cannon but, rather, with the far more 
exotic technology of electromagnetic launchers, which are in ac- 
tual use today at several sophisticated rollercoaster rides around the 
world. 


1.7. Minimizing with a Computer 


For the final two examples of this chapter, which return to the theme 
of the computer as a useful tool in extremal problems, suppose first 
that a man can walk n times faster than he can swim (it seems 
reasonable that n > 1, but I’ll not use that assumption in what 
follows). He wants to travel from A, on the edge of a circular lake 
with radius R (centered on point O) to C, also on the edge of the 
lake. C’s location is specified by the given angle 6 (measured from 
the diameter AOD), as shown in figure 1.6. His general strategy is to 
first swim along the chord AB, and then to walk the rest of the way 
along the lake’s edge from B to C. If his total travel time is 7, then 
where should B be to minimize T? 

If we denote by 6 the central angle subtended by the man’s walk, 
then the isosceles triangle OAB (with the chord AB as its base) has 
equal base angles of a and a third angle of y = x — 6 — B. Thus, 


(2a) + (x — 6 — B) = =x radians, 
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walk 


FiGuRE 1.6. Crossing a circular lake in minimum time. 


or 
4p 
Bray A + B). 


It is clear from figure 1.6 that the man’s swimming and walking 
distances are, respectively, 2R cos {5 (6 + B)} and RO. So, if we call 
his swimming speed unity (in arbitrary units) then his walking speed 
is n and we have the total travel time as 


|; R@ 
T =2Rcos{— (6+ 8B)} + — 
2 n 


pom is oro +; 
= R|2cos}~ (6+ B)¢+—|]. 
Z n 
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As a quick, partial check on this expression, notice that if B = z 
radians (C = A) then we also have 6 = 0 and T = OQ, just as we 
should have (it doesn’t take any time to travel from where you are 
to where you are!). 

Our problem then is simply this: given a value of 6 in the interval 
0 to x (thus locating C), what 6 minimizes T (thus locating B)? This 
is an easy question to study with the aid of a computer. Figure 1.7 
shows how T varies with 0, for five values of n, with B = 0 (C is 
directly across the lake from A) and figure 1.8 assumes 8B = 90°. In 
both figures the constant scale factor of R in the expression for T 
has been ignored since it has no affect on the value for 6 that gives 
an extrema in T. 

The plots in the two figures contain a wealth of information. In 
figure 1.7, for example, the n = 1 and n = 1.5 curves have their 
minimum values at 6 = 0 (the man should swim, all the way, from 
A to C), while the n = 2,n = 2.5, and n = 3 curves have their 


T/R 


0 20 40 60 80 100 120 140 160 180 
8 (in degrees) 


FiGurE 1.7. Total travel time across the lake, B = 0°. 
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FIGURE 1.8. Total travel time across the lake, B = 90°. 


minimum values at 0 = 180° (the man should walk, all the way, from 
A to C). The curves suggest that there is some value of n between 1.5 
and 2 where either of the pure walk-only and swim-only strategies 
would give the minimum travel time. What is that critical value of 
n? A little thought should convince you it ism = 52 = 1.57. The 
curves of figure 1.8 suggest the same general conclusion for B > 0, 
i.e., as n increases from unity the strategy for minimizing the total 
travel time begins as the pure strategy of swimming all the way and 
then switches to the pure strategy of walking all the way. Is this 
always true? That is, for any value of 8, is it true that there is never 
a mixed strategy of walking and swimming that minimizes T? Ill 
leave that for you to think about! 

For my last example in this chapter, consider the following prob- 
lem that is superficially similar to the one just treated, but which 
offers some surprising complications. But not so much complica- 
tion that we can no longer make a fruitful computer analysis. So, 
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A 
v, (walking) 
Land ! B 
Water 
Vv, (swimming) 
y 
V5 (walking) 
c D v7 ff E 


Land 


FiGureE 1.9. Another water-crossing problem. 


suppose now that the man is initially at point A on a beach with a 
right-angle bend, as shown in figure 1.9. The man wishes to travel 
from A to E in minimum time; at any point B, as he walks along 
the first section of beach toward C, he can enter the water and swim 
to D, where he exits the water and continues walking on the second 
section of beach to E. That is, he can “cut a corner” from one section 
of beach to the other. The lengths of the two sections of beach are 
a and b, as shown in figure 1.9. 

It is not difficult to express the problem mathematically. If we 
write v; and v2 for the man’s speeds while swimming and walking, 
respectively, and if x and y are the distances of points D and B from 
the corner of the beach (C), respectively, then the total travel time 
is a function of two variables: 


a— Jx?+y? b-x 
T (x,y) =" +" 54 
v2 v1 v2 
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_ (a+b)—@+y), vaxity? 


v2 Ui 


Our problem, then, is to determine the values of x and y that mini- 
mize T for given values of a, b, v,, and v2. 

The answer for vj > v2, for any a and b, is physically obvious: 
x = band y = a, i.e., the man swims the entire trip because then 
he travels the straight line path (shortest possible path) from A to 
E at the greater speed. As argued before, swimming faster than he 
can walk isn’t very plausible, however, and the case of v; < v2 is 
far more interesting (both physically and mathematically). Before 
continuing with the analysis of T(x, y), it is important to notice 
that, with a single exception, the values of x and y are independent, 
subject only to the constraints of 0 < x < b,0 < y <a. The single 
exception is that if either x or y is zero then so must be the other; 
this is because of the physically required continuous nature of a path 
from A to E. 

Now, we could attack the problem of minimizing 7(x, y) with 
the aid of rather sophisticated calculus, but that isn’t attractive for 
several reasons. First, that would be out of place so early in this book 
and, second, there is a very pretty geometric interpretation of the 
problem. Indeed, you'll see the same approach used later, when we 
get to linear programming in chapter 7. And third, the approach I’ll 
show you now makes great use of the sheer computational power of 
a computer. 

To begin, all pairs of points (x, y) that satisfy the constraints 
0 <x <b,0 < y <a form what is called the set of feasible solutions. 
For our problem, this set is the rectangle shown in figure 1.10, with 
the understanding that the bottom edge (x > 0, y = 0) and the left 
vertical edge (x = 0, y > 0) are not included in the feasible solution 
set; the corner point (0,0) is, however, in the feasible solution set. 
We want to find the point in the feasible solution set that minimizes 
T(x, y). Now, notice that we can write 


vjyv2 T = vi(at+b) — vy(x + y) + voV x? + y?, 


Or 


UI 
U2 


Vey ( Jot=u 
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ca a 


included 


excluded 


FIGURE 1.10. Feasible solution set for the geometry of figure 1.7. 


where 
VI 
U=v,T —- (=) (a+b). 
v2 


Since vj, v2, a, and b are given positive constants, then it is clear 
that the minimization of T is equivalent to the minimization of U. 
This simple observation turns out to be the key observation in the 
following analysis. 

The equation 


UI 
ay = (*) (x+y) +U 

U2 
defines a curve y = y(x) for any given U; as we vary U we will also 
vary the curve y = y(x). We wish to determine the minimum U that 
results in a curve that still passes through at least one point of the 
feasible solution set. Using a computer to draw these curves will give 
us all the insight we need to determine the minimizing U(= Unmin) 
and, hence, the minimized T (= Tyjn): 
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1 1 
Tmin = —(a+ b) + —Unin- 
v2 V1 


To plot y = y(x) asa function of U, it is convenient to change to 
polar coordinates: 


x =rcos(@) 
y=rsin(6), 


and so 


bp? cos2(9) + r2 sin?(6) = (*) [r cos(@) +r sin(6)] + U. 
2 


This is easily reduced to 


ee 
Pe (=) [sin(@) + cos(@)] 
v2 


r= 


9 


where, of course, it is understood that the radius vector r (at polar 
angle 6 from the origin to the arbitrary point (x, y) on the y(x) 
curve) is always nonnegative, i.e., r > 0. That is, the numerator 
and the denominator must have the same sign. 

For the remainder of this analysis, let’s assume that both the 
numerator and denominator are nonnegative, i.e., that 


U>0 


_— (*) [sin(@) + cos(@)] > 0. 
v2 


Since f(9) = sin(@) + cos(@) achieves a maximum value of /2 at 
6 = 45° (easily verified by either setting df/d0 = 0 or by simply 
plotting f(@)), then as long as 


(a) Sy 

==; < a) 

v2 J/2 

we will have r > 0 for any U > 0 for all values of the polar angle 


0. That is, we are now dealing with a restrictive case of v; < v2, i.e., 
with v, < (1/V/2)v>. 
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Returning to the original x, y coordinate system, we have the 
result we are after: the y = y(x) curve is the curve defined by 


a U cos(@) 
1 — (2) tsin(o) + e080) 
v2 
U sin(@) 
y — 


1 ~ (2) tsinc + cos(6) 
vy 


We can see now that all U “does” is scale the plot. Indeed, in figure 
1.11 you'll find the curve y = y(x) for vz = 5 with four different 
values of v; (all satisfy the condition v; < (1/ /2)v2), for two values 
of U (solid for U = 1, dashes for U = 0.4). It is clear from these plots 
that y = y(x) is elliptical, and that as U decreases toward zero, the 
curves shrink inward to around the lower-left-corner point of the 
feasible solution set (in solid). 


V,=1 v2=5 V,=2 V2=5 
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FiGuRE 1.11. Converging to the minimum-time solution as U vanishes. 
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I have, to be specific, used a = 1 and b = 2 to define the feasible 
solution rectangle, but it should be obvious that the actual size of 
the rectangle doesn’t matter. That is, for any choice of the a and b 
values, the smallest nonnegative U is U = 0, which collapses the 
y = y(x) curve to the single point x = 0, y = O. Since the smallest 
U gives the smallest 7, then 7 (0, 0) is the minimum journey time. 

Thus, somewhat surprisingly I think, if v; < (1//2)v2 then the 
man should walk the entire way, and that is so no matter what are 
the dimensions of the beach. So, we have solved the problem for 
the two cases of v; > v2 (Swim all the way) and v; < (1/ J/2)v2 (walk 
all the way). What if (1//2)v2 < v, < v2? I'll leave that case for you 
to ponder! 


How to Walk Out of the Woods 


Our lost hiker doesn’t know which way to go to walk di- 
rectly back to his car, but he does know that the car is some- 
where on the circumference of the circle, with a one-mile ra- 
dius, centered on his present location. So, to insure he returns 
to his car, he should first walk one mile in a randomly selected 
direction—if he is very lucky he’ll walk straight back along the 
radius that was his original path—and then walk along the 
circular (one-mile-radius) path centered on his starting point. 
Somewhere along that circular path is his car. The absolute max- 
imum distance he’ll have to walk is the initial one-mile radius 
plus the 27-mile circumference, i.e., 1 + 27 = 7.2832 miles. 

This is a mathematician’s solution, of course, as it ignores 
the practical detail of just how one manages to walk along a 
circular arc in a densely wooded forest. Another setting for this 
problem, that avoids that objection, is to have our lost soul be 
a fisherman in a rowboat one mile off shore, in a dense fog. 
Rowing in a circle is now “easy”; all the fisherman need do is to 
take one end of a rope, drop it overboard with a heavy anchor, 
measure the depth of the water, and then (with due regard for 
the depth) row away until enough rope has played out to put 
him a mile away. He can then, keeping the rope taunt, swing 
in a circular path about his original position. 

Now, here’s a new twist on this puzzle for you to think 
about. Is this solution the best one can do, where best means 
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(continued) 
having the minimum maximum path length? The answer is 
no, there are paths that require smaller maximum travel dis- 
tances that, with certainty, return the fisherman back to shore 
(this is not quite the same as getting back to the car itself, 
of course, but for both the hiker and fisherman it is probably 
good enough!) 

To see this, imagine our lost fisherman first picks some angle 
6 > 0, and then at random picks a direction that he assumes 
is the direct one-mile path to the shore. He then rows at angle 
6 to this line for a distance of \/1 + tan?(@), as shown in figure 


fisherman's 
original 
position 


FIGURE 1.12. Geometry of the lost fisherman problem. 
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(continued) 

1.12. That is, the triangle formed by his initial position, his 
new position, and the end of the one-mile path in the assumed 
direction to the shore, is a right triangle. If the assumed di- 
rection to the shore happens to be correct, then his journey 
is over. Otherwise, he next rows along a circular path with ra- 
dius 1 + tan*(6) until the line from his original position to his 
present position is once again @ with respect to the assumed 
one-mile path. That is, he rows along a circular path through 
an angle of 27 — 26 radians. Since the original solution was 
sure to eventually return him to shore, it is clear from figure 
1.12 that this new path will also eventually reach the shore 
as well (since the original solution path lies entirely inside the 
new path). The maximum total length of this new path is 


L(6) = V1 + tan?(6) + anvi + tan'@) (=) 
= [1 + 2m — 26] V1 + tan?@). 


0.6 
04 
02 


0.0 


Reduction in maximum path length 
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FiGuRE 1.13. Proof that 27 + 1 is not the minimum path length. 
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(continued) 

Notice that L(0) = 14+2z, the maximum length of the original 
solution. The astonishing result is that there are values for 
6 > O that do result in L(@) < L(O)! 

This claim is easily established by simply plotting the quan- 
tity L(O)— L(@) versus 6, as shown in figure 1.13. (We might try 
setting the derivative of L(@) to zero, of course, to calculate the 
value of 0 that minimizes L(@), but if you do that you'll find 
you are led to a transcendental equation in 0, i.e., you will still 
need to use a computer—see section 4.5.) Figure 1.13 shows 
that, for@ = 16.61°, L(@) is 0.2879 miles less than 7.2832 miles. 

Now, one final question for you—can our fisherman do 
even better? Is there a rowing path that has an even smaller 
maximum length that is still certain to get him to shore? The 
answer is again yes, and an analysis demonstrating that is 
given in appendix H—but don’t look until you’ve made an 
honest try. 


2. 


The First Extremal 


Problems 


2.1 The Ancient Confusion of Length and Area 


Ancient mathematicians, the Greeks and the Egyptians of the sev- 
eral centuries before Christ, treated a number of questions of the 
type we are interested in. They included the isoperimetric problem 
(what closed curve of given length encloses the greatest area?), and 
such questions as how to determine the line of minimum length 
that joins a given point to a given curve. Apollonius of Perga (262- 
190 B.c.) gave many ingenious geometric constructions to the latter 
question in his work Conics, but generally such problems are now 
handled easily with calculus. I’ll not discuss Apollonius’ solutions 
here, then, but if you are curious, you can see how he reasoned in 
volume 2 of Thomas Heath’s classic work A History of Greek Mathe- 
matics, (Oxford 1921, pp. 159-63). 

At just about the same time, the great Archimedes (287-212 B.c.) 
had tackled a fascinating problem concerning the volumes of the 
spherical caps cut off by planes passing through spheres of various 
radii, with the constraint that the caps all have the same surface 
area. (A spherical cap is the region of a sphere that lies above, or be- 
low, a plane that cuts through a sphere. Any plane passing through a 
sphere’s center, for example, divides the sphere into two equal spher- 
ical caps called hemispheres.) In his masterpiece De Sphaera et Cylindro 
(On the Sphere and the Cylinder), Archimedes showed that of all such 
equal-area caps it is the hemispherical cap that has the largest vol- 
ume. Again, I won’t go into Archimedes’ geometric demonstration 
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of this but, if curious, you can read how he did it in Thomas Heath’s 
1897 book The Works of Archimedes (reprinted in 1953 by Dover Pub- 
lications), pp. 88-90. 

Yet another extremal problem of ancient origin is the geodesic 
problem in the plane (what is the curve of minimum length that joins 
two given points?). The answer was intuitively accepted by the an- 
cients as a straight line, and so will we for the time being. In fact, 
however, the isoperimetric and the geodesic questions, all too eas- 
ily dismissed by students as having “obvious” answers, are actually 
extremely deep questions that stretched brilliant minds to find and 
prove the answers. The answer to the first (a circle) defied a rigorous 
derivation until the nineteenth century (!), while the answer to the 
second was formally proven only just a bit earlier (in the eighteenth 
century). The ancients “knew” the answers long before these mod- 
ern proofs, of course, and their proofs are actually quite convincing. 
But they contain acommon, very subtle flaw (by modern standards), 
the explanation of which I'll save for the last section of this chapter. 

To say that the ancients knew the answer to the isoperimetric 
problem, however, is not to say it was commonly known. There is, 
for example, an amusing passage in Book 4 of Polybius’ Histories (of 
the Greek world more than a century before Christ) that shows this. 
Titled “Computation of the size of Cities,” it reads: 


Most people judge the size of cities simply from their circumfer- 
ence. So that when one says that Megalopolis is fifty stades in 
circumference [about five miles] and Sparta forty-eight, but that 
Sparta is twice as large as Megalopolis, the statement seems in- 
credible to them. And when in order to puzzle them still more, 
one tells them that a city or camp with a circumference of forty 
stades may be twice as large as one the circumference of which is 
one hundred stades, this statement seems to them absolutely as- 
tounding. The reason of this is that we have forgotten the lessons 
we learnt as children. I was led to make these remarks by the fact 
that not only ordinary men but even some statesmen and com- 
manders of armies are thus astounded, and wonder how it is possi- 
ble for Sparta to be larger and even much larger than Megalopolis, 
although its circumference is smaller; or at other times attempt to 
estimate the number of men in a camp by taking into considera- 
tion its circumference alone. ...So much for those who aspire to 
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political power and the command of armies but are ignorant of 
such things and surprised by them. 


Polybius wrote that in the second century B.c., but the ignorance 
he was complaining about was difficult to overcome. Six hundred 
years later, for example, we find in a commentary written by the 
mathematically trained philosopher Proclus (on the first book of 
Euclid’s Elements) the following warning about the possibility of be- 
ing short-changed by someone who has not forgotten his geometry 
lessons: 


We often fail to watch out for [the error of equating area with 
perimeter] in the distribution of plots of land; and many persons 
have taken the larger of two plots and [improperly] got a repu- 
tation for justice as having chosen an equal portion because the 
sum of the boundaries is the same in both cases. 


Proclus gives the further interesting example of two isosceles trian- 
gles, one with sides 5, 5, and 6, and the other with sides 5, 5, and 
8. The unwary might assume the first to have the smaller area be- 
cause it has the smaller perimeter, but in fact a quick application 
of Heron’s area formula (or of the 3, 4, 5 right triangle geometry 
created by drawing the altitude to the longest side) shows that the 
two triangles have the same area (of 12). 

With these words of Polybius and Proclus in mind, it is now 
easy to understand why the average Greek of ancient times found it 
paradoxical that the two triangles shown in figure 2.1 should have 
the same area (because they have the same base and equal height), 
even though the perimeter of A is clearly less than that of B. Indeed, 


FiGurE 2.1. Two triangles with equal areas but unequal perimeters. 
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by sliding the vertex P of B arbitrarily far to either the left or right 
along the upper dashed line we can increase the perimeter of B 
without bound, without changing the area of B. This increase in 
perimeter comes with a price, of course: we need a bigger and bigger 
“expanse” of the plane to contain B even though its area remains 
constant. What is still astonishing to this day is that it is possible to 
do this example one better; to have a closed, simple (i.e., non-self- 
intersecting) curve that bounds finite area with an infinite perimeter 
in a finite region of the plane. 

In 1906, for example, the Swedish mathematician Helge von 
Koch (1870-1924) published what has come to be called the “von 
Koch snowflake,” a closed curve of infinite length that lies totally 
within a finite region of the plane (and so encloses a finite area). The 
iterative construction of this astonishing curve is easy to describe. 
We start with an equilateral triangle, with sides of unit length. Then, 
as the first iteration, the middle third of each side is removed and 
replaced with equilateral triangles with sides of length ;. Then, as 
the second iteration, the middle third of the sides in the first itera- 
tion curve are removed and replaced with equilateral triangles with 
sides of length s. And so on indefinitely, as suggested in figure 2.2; 
the von Koch snowflake is the curve that results as the number of 
iterations increases without limit. 

To see the astonishing perimeter/area property of the von Koch 
snowflake, let’s make the following definitions. After the nth itera- 
tion, n > 0, 


N, = number of sides 

£, = length of each side 

L, = length of perimeter = N,£,. 
So, £9 = 1, No = 3, and Lo = 3. It is obvious that with each iteration 
the number of sides increases by a factor of 4 (inserting a triangle 
in the middle of a side increases one side to four sides—the original 


side is split into two sections, plus the two sides of the triangle itself). 
Since No = 3, then 


Nz, = 3-4", n=0,1,2,---. 


Also obvious is that with each iteration the length of a side decreases 
by a factor of 3. Since &9 = 1, then 
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FIGURE 2.2a. von Koch snowflake iteration 0. 
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FIGURE 2.2b. von Koch snowflake iteration 1. 
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FIGURE 2.2c. von Koch snowflake iteration 2. 
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FiGuRE 2.2d. von Koch snowflake iteration 3. 
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LX? 1 
f,=1-{-=-] =-—, = 0, 1,2, 
8 gr 


Thus, the perimeter after the nth iteration is 


] 4\" 
Ly = Nab = 3-4" = =3-(5) ’ 


and so 


4 n 
lin L, = lim 3- (5) = 00. 
n—->0Oo n—->0o 3 


It is probably obvious as well that the total area bounded remains 
finite as n — oo because each iteration results in an increasingly 
“crinkly” curve (so crinkly, in fact, that in the limit n — oo we can’t 
draw it!) through the use of ever smaller triangles. Certainly all of 
the iterative curves remain inside a circle with a radius of, say, 1. But, 
to be sure this claim of finite area is clear, let’s calculate the precise 
area of the von Koch snowflake. 

To begin, observe that the area of an equilateral triangle with 


side lengths @, is, by Heron’s formula from chapter 1, with s = 5 


(Ln +£n + ln) = = ln, given by 


/3 ] ] ] 3 
V S(s — £n)(s — €n)(s — Ln) = Sige hae 


So, for example, if A, is the area bounded after the nth iteration, 
then 


Ao = 
Now, with each iteration we clearly increase the enclosed area; 
from A,_; we increase to A, by adding 3 - 4”~! equilateral triangles 
(a triangle for each side) with ¢, = 4. (For example, with n = 1 we 
increase from Ap to A; by adding three equilateral triangles, each 
with 2; = 5.) The area of each one of these added triangles is 


Pia) ~% 1 


4 (\3r 


4 


and so 
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Writing this expression for A, explicitly for the first few values of 
n > 1, we see that 


—1:A,=A roy Oy cag ieee 4\" |. 
n= 1. |. 440 0% 9 — /10 3 9 ; 
2:A A tcAee (st aie) sa CO 
| {| ie oo — —-{-]= — —~{-], 
2 1 ae 9 0 9 02 9 
or AJ=A ia 4)" 1 : 
oe 3\9 3 \9 
SB Ai A vans (8) =4 ie +i (6 
aa a ay ea 9 3\9 
re 4\* 
ae 9 ’ 
tr AZ=A pase 4\" 1 : =) 
O = —{- —{- —~{- 
aoa 3\9 3\9)] ° 3\9 
In general, then, we can write 
lim A, = A ae 4\" 1 a ie ae 
1 —— = = <2 = = — jee 
ees 3 \9 3\9)] '3\9 
— A 142 ee : ae w 2 
=e g 9 9 


The expression in the braces is a geometric series, i.e., 


] 
Lae Ge ae: |Ix| < 1 
= 2% 
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and, with x = 4/9 in the expression for lim,_.5. An, we have 


So, in the limit n — ov, the von Koch process increases the initial 
enclosed area by just 60%, while increasing the initial perimeter to 
infinity. The von Koch snowflake occupies a finite region of the 
plane as well, unlike triangle B in figure 2.1. Even Polybius and 
Proclus, I think, would have been astonished by the area/perimeter 
properties of the von Koch snowflake. 


2.2 Dido’s Problem and the Isoperimetric Quotient 


The origin of the isoperimetric problem can be traced back to the 
legendary story of the Phoenician queen Dido, told by Virgil in his 
Aenid. In that tale of events supposed to have taken place in the 
middle of the ninth century B.c., we read of Dido fleeing from her 
brother Pygmalion, who has murdered her husband. Escaping by 
sea, she finally lands in North Africa in what today is called the Bay 
of Tunis. There she comes to an agreement with the local inhabitants 
that she may buy all the land that can be bounded by the hide 
of a bull. The locals must have thought that to be a great joke, 
but Dido had the last laugh; she cut the hide into a great many 
long, narrow strips and attached them end-to-end. Then, using the 
seashore (given as straight) as part of the boundary, she laid out the 
hide-strip to enclose the maximum possible area, which she “knew” 
would be in the shape of a semicircle. Thus was founded, so goes 
the legend, both the ancient city of Carthage as well as the problem 
of Dido. 

Carthage disappeared long ago (destroyed for the last time at the 
end of the seventh century A.D.), but the problem of Dido has re- 
mained one of the classics of mathematics: to find, among all possi- 
ble curves of fixed length that connect two points on another given 
curve, the one curve that bounds the largest area. For the original 
problem of this type the given curve is a straight line (the seashore) 
and, from the assumption that the solution curve is semicircle, it 
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is then “easy to see” that the solution curve to the isoperimetric 
problem is a circle. You’ll see how all this works by the end of this 
chapter. 


The “Problem of Dido” is also less well known as the “Prob- 
lem of Hengist and Horsa.” The name comes from two German 
brothers, mercenaries that British legend says were hired to 
squash a fifth-century-A.D. invasion by the Saxons that resulted 
from Rome’s withdrawal of its occupying legions because Rome 
itself was under attack (thus leaving Britain vulnerable). As pay- 
ment for their services the brothers asked “only” for all the 
land that could be bounded by the hide of an ox (of course, 
they then did as Dido, cutting the hide into many thin strips 
and forming a large circle). This legend is at least as bloody and 
deceitful as is Dido’s, but perhaps with a bit more romance— 
the tale serves as the prologue to the story of Merlin and King 
Arthur. The isoperimetric legend, like worldwide flood legends, 
seems to be common to many civilizations across both geog- 
raphy and time. 


We will generally not be very much interested here in the meta- 
physical musings of philosophers, but Aristotle’s passage in Book 2 
of his De caelo (On the Heavens) is provocative, where he argues for 
circular motion of the stars: 


... the revolution of the heaven is the measure of all motions, be- 
cause it alone is continuous and unvarying and eternal, the mea- 
sure in every class of things is the smallest member, and the short- 
est motion is the quickest, therefore the motion of the heaven 
must clearly be the quickest of all motions. But the shortest path 
of those which return upon their starting-point is represented by the 
circumference of a circle [my emphasis] and the quickest motion is 
that along the shortest path. 


Did Aristotle write this because he knew the solution to the isoperi- 
metric problem? It certainly would seem so. 

The first mathematical attack on the isoperimetric problem is 
thought to have appeared in the work On Isometric Figures by the 
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somewhat mysterious Greek mathematician Zenodorus. Very little 
of his life is known. Even when he lived is open to some debate, 
but most historians place him shortly after the time of Archimedes, 
i.e., in the second century B.c. Indeed, there is mention by the 
Greek mathematical-philosopher Simplicius in the sixth century 
A.D. (in his commentary of De caelo) of a proof by Archimedes of 
the isoperimetric theorem, but many historians today believe that 
may be an error. Writing so soon after Archimedes, we might expect 
that Zenodorus himself would have had something to say about his 
predecessor’s work, but unfortunately On Isometric Figures has been 
lost to history, with our knowledge about its contents formed only 
by what later writers had to say. In particular, from the commen- 
taries written by the fourth-century-A.D. Egyptians Theon of Alexan- 
dria (on Ptolemy’s Syntaxis mathematica, better known today as the 
Almagest) and by Pappus of Alexandria (in his Mathematical Collec- 
tion). In his work, Zenodorus is said to have shown a number of 
results, such as 


1. the area of a regular n-gon is greater than the area of any 
other n-gon with the same perimeter; 

2. given two regular n-gons with the same perimeter, one with 
n = ny, and the other with n = n2 > nq, then the regular 
n2-gon has the larger area. 


From these two results it is easy to see that the circle (which can be 
thought of as a regular “infinity-gon”) with a given perimeter will 
have an area greater than any regular n-gon with the same perimeter. 

We can get a mathematical “feel” for these claims with the aid 
of what is called the isoperimetric quotient. This quantity, called the 
1.Q., is defined for any closed curve as 


A 4n A 
0. See . 


i Z L2 
It —_—_ 
(5) 


where L is the perimeter of the curve and A is the area enclosed by 
the curve. This definition is motivated by the fact that the denom- 
inator in the first expression is the area of the circle with perimeter 
L, and so the I.Q. of that circle (actually, any circle) is 1. Thus, the 
isoperimetric theorem says all closed curves obey the inequality I.Q. 
< 1 with equality iff the curve is a circle. 
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A similar inequality can be written in three dimensions by 
using the claim that, for a given surface area A, it is the sphere 
that has the largest volume, V. That is, 


V 


4 (ayes 
a li) 


with equality iff the three-dimensional body is a sphere. So, 
here’s a pretty little problem for you to play with. First, ex- 
plain what the above inequality “means,” that is, where does 
it come from? Then, use it to derive the following interesting 
inequality: if x,, x2,---,x, are n real numbers, then 


(p+xpte tay > @Ptyte- tag). 


The solution is at the end of this chapter (but don’t look until 
you spend at least a little effort on it!). 


It is instructive to calculate the numerical values of the I.Q. for 
some common curves, if only to see how they compare to unity. A 
semicircle with radius r, for example, has 


l 
A=-ar’ 
2 


L=2r+ar=r(2+4+7n), 


and so its I.Q. is 


ae, 
4n{=7r 3 
2 20 
$< = ——, = 0.7467. 
r°(2+7) (2+)? 
We can generalize this a bit by computing the I.Q. of an arbitrary 


sector of a circle with central angle 0 (6 = z is the special case of the 
semicircle). Then, 
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6 
L=2r+2zr (= =r(2+6). 
20 


Therefore, the I.Q. of the general circular sector is 


4 | 26 
in Oe 216 


POLO? W402 


We already know the value of the I.Q. for 6 = z, but is that the 
largest possible value? The answer is no, and here’s why. 

To maximize the circular sector I.Q. is equivalent to minimizing 
its reciprocal, i.e., to finding that value of 6 that minimizes (because 
we can ignore the constant “27” factor) the expression 


(24+6)? 4446467 4 
SS 4a ae 
7 6 uaa 


And that problem is equivalent to minimizing 6 + 4/0 because we 
can ignore the constant additive 4. Now, the AM-GM inequality says 
that 


with equality iff 6 = 4/0, i.e., iff 9 = 2 radians. For this 6, the I.Q. 
of the circular sector is 
27 (2) uh 


0.7854. 
(24+2)2 4 


The I.Q.’s of Zenodorus’ regular n-gon’s are, of course, particularly 
interesting, and we would expect that as n — ov, the I.Q. should 
approach unity (as the n-gon approaches a circle). To see that this is 
indeed what happens, consider figure 2.3, which shows one of the n 
similar triangles that a regular n-gon can be decomposed into. The 
two equal sides of the triangle have unity length, and the central 
angle is a, where 
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| center of regular n-gon 


FiGure 2.3. Triangular building block of a regular n-gon. 


If we denote the height of the triangle by h and the base by x, then 
IU 
h = cos(@) = cos (—) 
x = 2sin(@) = 2sin (=) 
n 
The area of the triangle is A,;, where 


] N\ . (Xt 
A; = - hx =cos (—} sin (—} , 
2 n n 


and so the area of the regular n-gon is 


W\. (i 
A =nA,; =Nncos (=) sin (=) 
n n 


The perimeter of the regular n-gon is 


_ fl 
L =nx =2nsin(—), 
n 
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and so the I.Q. of the regular n-gon is 


4nn sin (—) cos (~) 
mznsin { — )cos{ — 
Eigen eget, 
An2 sin? (=) n 
n 


In particular, the I.Q.’s for the equilateral triangle (n = 3), the square 
(n = 4), the regular pentagon (n = 5), and the regular hexagon 
(n = 6), are: 


1.0.3 = = col (=) — 0.6046 
1.0.4= = cot (>) = 0.7854 
1.0.5 = = col (=) — 0.8648 
1.0 = = cot (=) — 0.9069 


Thus, all regular n-gons except for the first one (n = 3) have I.Q.’s 
that exceed that of the semicircle, the I.Q. of the square is exactly 
equal to the I.Q. of the circular sector of maximum I.Q., and these 
results suggest limy_,5/.Q., = 1, the I.Q. of the circle. Still, these 
numerical results in no way prove the isoperimetric theorem. To do 
that, we need much deeper arguments. 

As an aside, before we get into those arguments, it is interesting 
to note that the ancient question of how to tile the plane (how to 
divide an infinite two-dimensional surface into congruent n-gons) 
is intimately related to the concept of the I.Q. Pappus’ fame today, 
for example, is due at least in part to his speculation that bees 
make their honeycombs with hexagonal cells because that structure 
minimizes the total wax needed to store a given amount of honey in 
a regular array of cells (see appendix C). This so-called “honeycomb 
conjecture” defied a mathematical proof until very recently (1999), 
when the American mathematician Thomas Hales at the University 
of Michigan finally succeeded in finding one. The ancients were 
pretty good at formulating tough problems! 


A problem closely related to one that Zenodorus treated is 
that of inscribing the maximum area N-gon in a given circle 
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n" side of N-gon 


FIGURE 2.4. Making a regular N-gon. 


(of radius R). The answer is that the N-gon should be regular, 
and the proof by modern methods is elegant. With reference 
to figure 2.4, 6, is the central angle subtended by the nth side 
of the N-gon, which divides the N-gon into N triangles (and 
SO ys 0, = 27); the nth triangle has area A,,. 

To find A,, write x (half the base of the nth triangle) as 


| 
= R = 6, ’ 
x sin (; ) 


and h (the height of the triangle) as 


1 
h = Reos (> én) 
2 
>. [1 l 
A, =xh= R*sin{ — 6, } cos{ — 6, |}, 
2 2 


Thus, 
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or from the trigonometric identity sin(@) cos(a@) = 5 sin(2a@), we 
have 


i See ae 
Ang sin (6, ) 


and so the total area of the N-gon is 


N N 
NR? 1 
A= d Ay = > W d sin (6,) . 


I’ll now use a result that is a special case of a general re- 
sult due to the self-taught Danish mathematician Johan L.W.V. 
Jensen (1859-1925), who spent his career not as an academic 
but rather as an engineer for the Copenhagen Telephone Com- 
pany (he eventually became Chief Engineer). In 1906 he pub- 
lished what has come to be known as Jensen’s inequality (you 
can find it stated and proven in appendix B); the special case 
of it that I’ll use here is 


es ae 
w dsm (6,) < an( ye 


with equality iff 0; = 6. = --- = On, i.e., when all of the central 
angles are equal, and so the N-gon of maximum area is the 
regular N-gon with area (NR7/2) sin(27/N). 


It is immediately clear, before we get into details, that if there 
is a solution to the isoperimetric problem, then it must be what is 
called a convex figure. A convex figure is one that, given any two 
points A and B (with the requirement that these points are either 
on the boundary edge of the figure or inside the figure) then all 
the points on the chord AB are also either on the boundary edge 
or inside the figure. More graphically, the boundary edge (a convex 
curve) of the figure has no indentations, and there are no holes in 
the figure. Figure 2.5 shows two examples of nonconvex figures. 

The reason why a nonconvex figure cannot be the solution to 
the isoperimetric problem is that it is always possible to transform 
such a figure into another figure (possibly still nonconvex) that has 
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FIGURE 2.5. Two nonconvex figures. 


the same (or even less) perimeter and a larger area. For example, 
for g» in figure 2.5, simply remove the hole and you have a new 
figure with more area and a smaller perimeter. For g,, reflect the 
indentation through the dashed tangent line (as shown), giving a 
new figure with increased area and the same perimeter. So, whatever 
the solution figure to the isoperimetric problem may be, it must be 
convex, and from now on we’! limit our attention to convex figures. 

I’m not going to show you Zenodorus’ proofs here (if you’re curi- 
ous you can find them in volume 2 of Heath’s previously cited work 
A History of Greek Mathematics, pp. 207-12). Rather, I’ll show you a 
more “recent” geometric analysis from the nineteenth century, due 
to the Swiss mathematician Jakob Steiner (1796-1863). I should tell 
you that in 1789 Steiner’s fellow countryman Simon Lhuilier (1750- 
1840) also proved Zenodorus’ results in a manner quite different 
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from the approach you'Il find in Heath’s book. It is Steiner’s beauti- 
fully elegant 1842 arguments, however, first for the problem of Dido 
and then the isoperimetric theorem, that are models of mathemati- 
cal ingenuity even if they do suffer from that subtle flaw I tantalized 
you about in the previous section. 

Before actually presenting proofs, however, I’ll conclude this sec- 
tion by addressing, one last time, the concept of duality (introduced 
in the previous chapter) as it relates to the isoperimetric theorem. 
The following two statements are logically equivalent, quite inde- 
pendent of whether or not they are actually true (they are true, but 
that will be established in the next section): 


A. Of all closed curves in a plane with equal perimeters, the 
circle bounds the largest area; 

B. Of all closed curves in a plane with equal areas, the circle has 
the smallest perimeter. 


To prove the claim of logical equivalency, I’ll first assume that A 
holds, and then show that B necessarily follows. To do this, begin 
by assuming that B does not follow (and this will, as you’ll soon 
see, quickly lead to a contradiction and so B must follow A). Thus, 
contrary to B, let’s assume that for a given circle C with a given area, 
there is some other closed curve D with the same area but with a 
smaller perimeter. 

SO, imagine that we shrink C down to the smaller circle C that 
has a perimeter equal to that of D. Obviously the area of C is smaller 
than that of C, i.e., the area of C is smaller than the area of D. Thus, 
C and D have the same perimeter but it is D, not the circle C, with 
the larger area, which contradicts A. This contradiction must be the 
result of our using the negative of B (that B does not follow from 
A), and so B must follow from A. 

To complete this demonstration of duality we must next show the 
reverse, i.e., if we assume that B holds, then A necessarily follows. 
So, as before, let’s assume that A does not follow and, as before, we’ll 
be able to derive a contradiction. Thus, contrary to A, let’s assume 
that for a given circle C with a given perimeter there is some closed 
curve D with the same perimeter that has a larger area. We now 
imagine that C is expanded up to the larger circle C that has an area 
equal to that of D. Since we expanded C to get C, then the perimeter 
of C will be greater than the perimeter of C, i.e., the perimeter of C is 
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greater than the perimeter of D. That is, C and D have the same area 
but it is D, not the circle Co that has the smaller perimeter, which 
contradicts B. This contradiction must be the result of our using the 
negative of A (that A does not follow from B), and so A must follow 
from B. 

None of this proves the isoperimetric theorem itself, however. 
What we need to do next is to show either the truth of A or of B 
(either one will do, of course, as the other will then logically follow). 
That’s our task in the next section. 


2.3 Steiner’s “Solution” to Dido’s Problem 


To show how Steiner arrived at his demonstration of the solution of 
the original problem of Dido, we need to establish two preliminary 
results in elementary geometry. The first one is a standard high 
school exercise, that of showing that any triangle inscribed in a 
circle, with a diameter as a side (the hypotenuse), is a right triangle. 
You can find a proof of this in any high school geometry text. The 
second result we’ll need is that, of all possible triangles with two 
sides of given length, the triangle of maximum area is the right 
triangle with the given sides as the perpendicular sides. This is very 
easy to show. With reference to figure 2.6, let x and y be the two 
given sides, with angle 6 between them. The height of the triangle 
is then h = x sin(@) and the area of the triangle is 


FIGURE 2.6. Maximizing the area of a triangle. 
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l l 
A= 5 yh = 5 xy sin(@). 
A is maximized, then, by making the factor sin(@) maximum, i.e., 
sin(@) = 1, which means @ = 90°, and we are done. 

Now we can follow Steiner’s solution to the original problem of 
Dido: what curve of given length joins two points on a given straight 
line so as to maximize the enclosed area? Let A and B be the two 
points on the given straight line L, as shown in figure 2.7, and 
suppose the solution curve C is not a semicircle. That means, by 
our first preliminary result, that there must be a point P on C such 
that angle APB # 90°. (Actually, for us to conclude this we should 
really show that the circle is the only curve such that for any P the 
angle APB = 90°, but I’ll skip over this detail.) Figure 2.7 shows that 
the dashed chords AP and PB divide the area enclosed by L and C 
into the three regions R,, Ro, and R3. 

Next, imagining AP and PB to be rigid rods “hinged” at P, with 
sliding contacts on L at A and B, let’s adjust either A or B (or per- 
haps both) to A’ and B’ in such a way that the angle A’P’B’ is a right 
angle (as shown in figure 2.8). That is, as we make the adjustments 
A — A’ and B — B’, then P will move to some new point P’ such 
that the lengths AP and A’P’ are equal (and also the lengths BP and 
P’B’ are equal). I’ll call the resulting new curve C’; it is not neces- 
sarily a semicircle since there may be more points where the angle 


C 


FiGureE 2.7. Steiner’s isoperimetric argument, part 1. 
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C 


FIGURE 2.8. Steiner’s isoperimetric argument, part 2. 


APB # 90°). Since the two chord lengths are unchanged by these 
adjustments we can place regions R; and R3 on the chords A’P’ and 
P’B’, respectively, while region R2 will adjust to the new region R%. 

Now, by our second preliminary result, the area of R, is greater 
than the area of R2. So, we have taken an arbitrary curve C and 
transformed it into a curve C’ with the same perimeter that encloses 
(with L) an area greater than that enclosed by C and L. (We do 
have another potential objection here; how do we know that the R; 
and R», regions don’t “bump into” each other, i.e., overlap, during 
the adjustment? We don’t, but again I’ll pass over this detail.) The 
only curve C that does not allow such an area-increasing, perimeter- 
preserving transformation is the semicircle (as then there is no point 
P on C such that the angle APB # 90°). 

At this point Steiner believed he had constructed a purely geo- 
metric solution to the original problem of Dido, and he proceeded 
to attack the isoperimetric problem by next making the following 
preliminary observation: if g is the (convex) solution figure to the 
isoperimetric problem, then any chord joining two points on the 
boundary of g that bisects the perimeter (assuming such a chord 
exists) would also bisect the area. (The existence of such a perimeter- 
bisecting chord is demonstrated in appendix D.) This is so because 
if we suppose the area is not bisected, then we could take the larger 
area and reflect it through the chord. That would give us a new figure 
with the same perimeter and a larger area than that of gy, which is 
impossible because it is g (by assumption) that is the solution figure. 
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Now, given the solution figure g to the isoperimetric problem, 
we draw a chord that bisects the perimeter of ¢y, as well as, as shown 
above, the area of g. This bisection splits g into g; and ¢2, where ¢; 
and @ have equal areas and equal perimeters (indeed, they have a 
shared perimeter, since the bisection chord is common to ¢ and ¢). 
We clearly maximize the area of g by maximizing the (equal) areas 
of ~, and g2. Notice, however, that g; and @2 are each semicircular 
disks by Steiner’s original solution to the Dido question, and so 
is a circular disk. So, concluded Steiner, the solution curve to the 
isoperimetric problem is a circle. 


2.4 How Steiner Stumbled 


Steiner’s analysis is undeniably clever. His contemporary, however, 
the German mathematician Peter Dirichlet (1805-59), pointed out 
that in addition to the objections I mentioned in the previous sec- 
tion, Steiner had made the unstated assumption that there actually 
is, in fact, a solution to the isoperimetric problem. Most people reply 
to that objection by saying “Of course there is a solution—it’s obvious 
there is a solution!” That was Steiner’s reply, in fact, at least at first, 
but it ignores the fact that there are plenty of geometry questions 
that look at first glance like they should have solutions—but in fact 
do not. 

Consider, for example, the problem of finding that convex figure 
of greatest area among all convex figures with a perimeter less than 
1. The fact is that there is no solution to this problem; here’s why. Let 
€ be some arbitrarily small positive number, and suppose that the 
convex figure g has perimeter | — ¢. Then, simply expand ¢ up in 
scale to a new (similar) figure with the larger perimeter 1 — sé (which 
of course is still less than 1). This new figure has an area greater than 
that of g, and in fact we can repeat this process as many times as we 
wish. That is, we can generate an endless sequence of convex figures 
all with perimeters less than 1 but with ever increasing areas; there 
is no “largest area” figure, and so we would be in error to a priori 
assume that there is a solution figure. 

An even more dramatic illustration of the danger in assuming the 
existence of a solution is the surprise answer to a problem that is of 
a nature opposite to the one just considered. Called the “Kakeya 
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problem” after S. Kakeya (1886-1947), the Japanese mathematician 
who posed it in 1917, it asks: what is the smallest area in which a line 
segment of unit length (in arbitrary units) can be rotated through 
360°? Virtually everybody believes, upon first hearing this, that 
there is a smallest area. Kakeya himself conjectured that the min- 
imum area is $7. In a paper published in 1928, however, the Rus- 
sian mathematician Abram Besicovitch (1891-1970) showed that 
no matter how long the line segment, there is no smallest area! That 
is, there does exist a figure with the area (for example) of the period 
at the end of this sentence in which a line segment one million light 
years long can be rotated through 360°. Besicovitch actually showed 
how to make such a figure (it’s nonconvex and very complicated— 
no surprise there!), and you can find an elementary discussion of 
just how to construct it in Besicovitch’s own words, in his paper 
“The Kakeya Problem” (American Mathematical Monthly, September 
1963, pp. 697-706). Besicovitch’s result shows that one must be very 
careful before assuming there is always a solution. 

l’ll close this section with the observation that the entire point 
of one of the great, historically important maximum problems in 
pure number theory was to prove that there is no maximum. This 
is Euclid’s wonderful demonstration, in his Elements (Book 9), that 
there is no largest prime, i.e., that there is an infinity of integers 
with no factors other than themselves and unity. His elegant proof 
is perhaps the ultimate in simplicity. Suppose that there are only n 
primes, labeled pj, p2,---, Pn. That is, p, is the largest prime. Then, 
form the new (obviously much larger) integer: 


P= pjp2r:-: Pn +1. 


What can we say about P? 

By our assumption that p, is the largest prime, we conclude that 
P must not be prime. It therefore must be possible to write P as the 
product of primes (simply keep factoring the factors of P until all 
the factors are prime), but clearly none of the assumed finite number 
of primes divides P (because of that “+1”). So, P is not factorable 
into a product of primes, which says P itself must be a prime. But 
that contradicts our assumption that p,(<P) is the largest prime. 
The only way out of this swamp is to admit that our assumption is 
false and that there is no largest prime; there is an infinity of primes. 
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Before leaving the primes, let me show you just one more “there 
is no maximum” fact about primes that surprises most people when 
they first encounter it. Since the primes are infinite in number, 
then of course no matter how far up we go in the integers we 
will always keep finding them. But that doesn’t mean they occur 
in any sort of regular way. Far from it! Indeed, if we call g(p,) 
the gap between consecutive primes p, and p,p4+1, i.e., if we write 
2(Pn) = Pnt+i — Pn — 1, then in fact g(p,) has no maximum. There 
are always two consecutive primes such that the gap between them 
is as large as we like. For example, if g is to be at least 10! (the 
famously huge googol), or if it is to be the even more impressive 
108°°8° (the equally famous but stupendously larger googolplex) 
then there exist two consecutive primes that have a gap as least as 
large as those g’s. 

The proof is direct: the production of a specific sequence of con- 
secutive integers with a length g that is obviously free of primes 
(every integer in the sequence is evenly divisible.) Simply take the 
desired value of g and form the sequence of consecutive integers of 
length g defined by 


(g+1)!+2, (g+)D!4+3, (g+)D!44,---, (g+D!+e4+1. 


The first integer is divisible by 2, the second is divisible by 3, etc., 
etc., etc. and the last integer is divisible by g+ 1. Since we could have 
Started with g as any finite integer, then there is no maximum g. 


Even though there are arbitrarily large gaps between suc- 
cessive primes, it has also been shown that successive primes 
do obey a certain rule on when they must occur. In 1845 the 
French mathematician Joseph Bertrand (1822-1900) conjec- 
tured that for all n > 3 there is at least one prime between 
n and 2n — 2 (the conjecture is often stated in the alternative— 
and perhaps more elegant—form of for all n > 1 there is at 
least one prime between n and 2n). Bertrand’s conjecture was 
proven in 1850, by the Russian mathematician Pafnuty Cheby- 
shev (1821-94). Arbitrarily large gaps are compatible with this 
result because large gaps occur only when the numbers in the 
gap are vastly greater than the gap length. 
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In 1932 a much simpler proof of Bertrand’s conjecture was 
found by the Hungarian mathematician Paul Erdos (1913-96), 
when he was but an eighteen-year-old student in Budapest. 
When Erd6s announced his proof, he accompanied it with the 
thyme “Chebyshev said it, and I say it again, there is always a 
prime between n and 2n.” 


2.5 A “Hard” Problem with an Easy Solution 


Here’s an elegant solution to an interesting problem that occurs in 
many advanced books on calculus, which we can attack using the 
elementary concepts developed earlier in this chapter. Suppose we 
are presented with a length of string, which we are to cut into two 
pieces. Then, with those two pieces we are to form two figures with 
prescribed shapes, e.g., a square and a Circle, or a half-circle and an 
equilateral triangle. How should we cut the string to minimize the 
total area of the two figures? Or, what if instead of just one cut and 
two figures we are more generally to cut the string n — | times and 
then to form n figures (with prescribed shapes) enclosing minimum 
total area? The general question sounds like a tough problem (our 
first question is fairly easy) but, perhaps astonishingly, we can solve 
the general case easily, too, with the aid of the isoperimetric quotient 
(I.Q.) and Jensen’s inequality (see appendix B). 

First, recall from the first part of this chapter that every planar 
figure shape, independent of its actual size, has the same 1.Q., de- 
fined as 


where A is the area of the figure and L is the figure’s perimeter. If we 
write 1/A; as the I.Q. of the ith prescribed figure (and so A; is a given), 
and if A; and L; denote the area and the perimeter, respectively, of 
that figure, then the total enclosed area is A, where 


ae ee Ce eee ey eee imeem ace 
ee Ae Ne eG 


THE FIRST EXTREMAL PROBLEMS 63 


If we write L as the total uncut length of the string, then of course 
Ljt+lo2+---+L,=L, with all L; > 0, 


and it is with this constraint that we wish to find the L; that min- 
imize A. This is the sort of problem usually treated with a calculus 
technique called Lagrange multipliers (discussed in chapter 6), but 
we can do it now with no calculus, using Jensen’s inequality. (The 
calculus approach is much faster, so don’t conclude that calculus 
isn’t important!) 

Applying Jensen’s inequality to the strictly convex function f(x) 
= x*, we have (with all c; > 0 and summing to one) 


2 2 2 
(CyX) +C2X2 +++ + CpXp)” SCX; + C2X4 +00 + Cx 


with equality iff x; = x2 =--- = x,. So, define c; and x; as 
ki 
Ce = ———__ 
Ay +dA2++:: +A, 
is 
xj =z. 
xi 


Note that c; > 0 for any i and that, indeed, the c; sum to 1. 
Our inequality then becomes 


At Ly d2 Ll 
i eal 
ere errel 


ere) see 
Ap Hb Ag tee tAn \AI Ay tAg+---+An \A2 


foeeey Ar () 
Ay tAgt--- +A, An 


or, after some canceling, 
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(Lit Lot-::+Ln) — 1 E oan +2] 
(Ar tAgte++ + An)? ~ Ar tagte++ +n) LAr 2 And 
or, again after some canceling, 
(Ly +Lg+-+++Ln)? Le 
7 a ae a a a ee Aj Aa Po Ap 
elena. ph 47 A 
mone a= = dace — = 474A, 
~ AY A2 An 
with equality iff xj = x2 =---= Xp. 
So, A is minimized (becomes equal to its lower bound) when 
L L Ly 
Se Saige a, to be determined next. 
Ay ho An 
That is, when A is minimized, we have A equal to 
i ae eae oe L? 
es ee eT. 
4a (Aj +Az+---+Any) 4a 1 2 An 
or 
a aE See ee 
—  — = — © eee = 
rear ¢ ae ty Ce Seema On ue aie ea An 
and so 
_ L 
Ap tag tee tan 
Thus, 
a ! ae oe L 
A Ol De: Apes. 
or, at last, 
ii 


(= 7 i= 1,2,3,---,n, 

Ay +Agt+::-+A, 
which is the solution to the problem of how to cut the original string 
of length L into n pieces, each of length L;. For example, suppose 


we are to cut the string into two pieces and use those pieces to form 
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the circle and the half-circle that enclose minimum total area. As 
shown earlier, 4; (circle) = 1/1 = 1, and Az (semicircle) = 1/0.7467. 
Thus, 

1 0.7467 

Ly = —— L = —— L = 0.4275 L, 
I 1.7467 
1+ 
0.7467 


and so, to minimize the total area of the two figures, 42.75% of the 
string should go to the circle and the rest should go to the semicircle. 


2.6 Fagnano’s Problem 


To end this chapter I’ll describe two fascinating examples of geomet- 
ric minimization; in the first the demonstration of the existence of 
a solution is explicit. To begin, consider the so-called “minimum- 
perimeter triangle of Fagnano.” This problem has its origins with the 
Italian mathematician Giulio Carlo Toschi di Fagnano (1682~—1766), 
who showed the existence part, and his priest-mathematician son 
Giovanni Francesco Fagnano (1715-97), who completed the mini- 
mization argument in 1775. The father’s contribution was to show, 
given any acute-angled triangle ABC (as shown in figure 2.9) and 
any given point U on one of the sides (BC in the figure), how to 
construct the inscribed triangle of minimum perimeter with a vertex 


B 


FIGURE 2.9. Fagnano’s problem, part 1. 
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at U. (You'll soon see why the restriction of an acute triangle is nec- 
essary). The son later showed how to pick U to select the absolute 
minimum-perimeter triangle. For this problem, then, there is no 
question about the existence of a solution. The son used the dif- 
ferential calculus to arrive at his answer, but the clever geometric 
proof that follows is due to the Hungarian mathematician Lipot 
Fejer (1880-1959), who discovered it while still a student at the Uni- 
versity of Budapest. 

To see the existence of a solution for a given U, first connect U to 
vertex A to form line segment AU, and then “reflect” AU about the 
triangle sides AB and AC to form the line segments AU” and AU’, 
respectively. Then, with W and V as arbitrary points on AB and BC, 
respectively, connect U” to W and U’ to V. And finally, connect W, 
V, and U to form an inscribed triangle. By this construction it is 
clear that (in terms of length) WU” = WU, and also that U'V = UV. 
Now, the perimeter of the inscribed triangle UVW is UV+VW+WU, 
but this is equal to U'V + VW + WU", the length of the broken line 
connecting U’ to V to W to U”. The length of the broken line will 
be minimized when, instead of being broken, it is straight. That is, 
after reflecting AU about the sides AB and AC, we can determine the 
W and the V that minimize the perimeter of the inscribed triangle 
UVW by simply connecting U” and U’ with a straight line and 
observing where that line intersects AB and AC, respectively, as 
shown in figure 2.10. Thus, we have found by construction the 
unique inscribed minimum-perimeter triangle UVW for a given U. 


B 


FIGURE 2.10. Fagnano’s problem, part 2. 
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The final part of the problem is to determine that particular U 
that gives the minimum of the minimums. Notice, first, that by 
construction the triangle AU’U” is an isosceles triangle, with the 
angle at vertex A equal to 20 + 2B = 2(a + 8), i.e., this angle 
is always twice the vertex A angle of the original ABC triangle, 
and so the vertex A angle of AU’U” is the same for any choice of 
the point U. Thus, the “best” choice for U, the particular U that 
minimizes the perimeter of UVW, is the U that minimizes the equal 
length sides AU” and AU’ of the isosceles triangle AU’U”. That is so 
because, given an isosceles triangle with a fixed vertex angle at A, we 
minimize the base of that triangle (U"” WVU’, equal to the inscribed 
triangle’s perimeter) by minimizing the lengths of the two equal 
sides of the isosceles triangle. But that simply says we pick U to 
minimize the length of AU, i.e., we should draw AU perpendicular 
to BC. In other words, when we have the best U, we have AU as the 
altitude from vertex A to the side BC. 

If you look back at what we have done you'll see that the points 
W and V are uniquely determined, i.e., the minimum-perimeter 
inscribed triangle is unique. The immediate consequence of this is 
that we don’t have to go through all of the detailed steps of the 
proof (e.g., reflecting lines about other lines) to actually draw the 
minimum-perimeter triangle. This is because our choice of the side 
BC to work from was arbitrary—we could have started with side 
AC and found that the resulting line BU would be the altitude from 
vertex B to AC. Or we could have started with side AB and found 
that the resulting line CU would be the altitude from vertex C to 
AB. In the end, however, we would arrive at the same inscribed 
minimum-perimeter triangle because that triangle is unique. So, to 
actually construct UVW, simply draw the three altitudes and thereby 
immediately locate the points U, V, and W. The resulting inscribed 
triangle is called the pedal or orthic triangle of the original triangle 
ABC. You can also now see why ABC must be acute—it insures that 
all three altitudes are inside ABC, i.e., that U, V, and W lie on the 
sides of ABC and so the triangle UVW is truly an inscribed triangle. 

As my final example to demonstrate that interest in geometric 
minimization did not cease with the ancients, consider the prob- 
lem of the “spanning circle of n points.” Imagine that you have n 
points positioned arbitrarily in the plane. We can measure the dis- 
tance between every possible pair of points and call the maximum 
distance d. There are, of course, just n(n — 1) such distances, and so 
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computing the value of d is a straightforward matter. Now, suppose 
we wish to draw a circle whose interior contains all n points; such 
a Circle is said to span the points. The problem is to determine the 
smallest circle that spans the points. A practical form of this problem 
would be, for example, determining where to locate a fire station 
within a community to minimize the maximum distance from the 
fire station to any of the surrounding homes. It is clear, of course, 
that a circle with radius d spans the points; simply pick any one of 
the points as the center of a circle with radius d and observe that, 
by definition, no other point is more distant than d. 

Is it possible to construct a spanning circle that is smaller? Yes, 
indeed it is. A spanning circle with a radius no greater than t/3d ~ 
0.577 d always exists. The proof is by elementary (but ingenious) 
geometry, and you can find it all worked out in the book by Hans 
Rademacher and Otto Toeplitz, The Enjoyment of Mathematics (Prince- 
ton University Press 1957, pp. 103-10). Even more on the minimum 
spanning circle—which dates back to at least 1860 and the work of 
the English mathematician J. J. Sylvester (1814—-97)—can be found 
in the book by Franco P. Preparata and Michael Ian Shamos, Com- 
putational Geometry (Springer-Verlag 1985, pp. 248-54). Those pages 
also discuss the dual problem: what is the largest empty circle inside 
the convex hull of the given n points (think of the points as verti- 
cal posts, and a rubber band snapped all around them, as shown in 


FIGURE 2.11. A convex hull and its largest interior empty set. 
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figure 2.11) that contains none of the points? That would tell us, for 
example, where to place an objectional service facility for the town, 
e.g., a centrally located waste-treatment plant that nobody wants to 


live near! 


Solution to the Problem in Section 2.2 


A sphere of radius r has surface area A = 4rr’. Thus, 


A\1V2 
PS 


The claim is that, for a given A, this V (for a sphere) is the 
largest possible. So, if V is the volume of any three-dimensional 
body, then 


with only the sphere achieving equality (see the end of this 
box). 

For the second part of this problem, suppose we have n 
spheres with radii x,, x2, ---, x,. The total surface area and total 
enclosed volume are 


A = 4x? + 4r x5 +: +4rx? = \ "40x; 


4 3, 4 , 4, 4 
ila dO ie A ta a 


Now, imagine that we glue all of these spheres together to 
form a (rather lumpy!) single body. This single body will obey 
the above inequality (which, after squaring and rearranging, 
becomes A> > 3672 V7). So, 
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(continued) 


3 2 
(x ins) > 367 (> ja) 


i 


which quickly reduces to 


(Ex) (xe). 


Notice that this argument only makes physical sense if all of 
the x; = 0, because a physical sphere can’t have a negative 
radius. However, if one or more of the x; < 0, it is clear that 
the left-hand side of the inequality is indifferent to the sign, 
while the right-hand side becomes smaller. That is, one or more 
x; < 0 simply strengthens the inequality. Thus, 


(xptad+---taf)’ > (ep tag+-- tg) 


for all real x;. 

Historical note: The entire argument of this box is based 
on the assumed truth of the three-dimensional isoperimetric 
theorem, i.e., on the inequality A> > 362V?. This was for- 
mally established in 1884 by the German mathematician H. A. 
Schwarz (1843-1921). The general n-dimensional isoperimet- 
ric inequality was later shown to be 


In 5 ynt-lyn-l 
where I" is Euler’s generalization (with his gamma function 
integral) of the factorial function: 
OO 
r(x) = | ete lade, 
0 


The general isoperimetric inequality was established in 1939 
by the German mathematician Erhard Schmidt (1876-1959). 


3. 


Medieval Maximization 


and Some Modern Twists 


3.1 The Regiomontanus Problem 


After the ancient isoperimetric problems discussed in the previ- 
ous chapter, it seems that very little if anything new on minimiza- 
tion/maximization theory appeared in mathematics for a very long 
time. Indeed, not for another fifteen centuries after Christ! And then, 
in 1471, the German mathematician Johann Miiller (1436—76), more 
commonly known today as Regiomontanus, posed a clever maxi- 
mization problem totally unlike any that had come before. I’ll state 
it here in slightly more dramatic fashion than he did, but the basic 
problem itself is as Regiomontanus conceived it. 


A somewhat confusing trait of some of the medieval math- 
ematicians makes it appear that there were more of them than 
there were—they often used more than one name. In the case 
of Johann Muller, for example, who was born in KOnigsberg 
(which mean “King’s Mountain”), he Latinized that to “Re- 
gio monte,” which soon evolved into Regiomontanus. Two 
other famous examples of the double-named syndrome are 
the Italians Leonardo of Pisa (1170-circa 1250), also known 
as “Fibonacci,” and Niccolo Fontana (1500-57), who was also 
called “Tartaglia.” So, we have six names, but only three mathe- 
maticians. 
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FiGurE 3.1. Tartaglia’s cubic function. 


As an aside on Tartaglia, a moderately interesting maximiza- 
tion problem is due to him, dating from some time between 
1556 and 1560; but it is, at heart, really only a slightly more 
sophisticated version of Euclid’s ancient problem of dividing 
a number into two parts to maximize their product. The Re- 
giomontanus problem is far more advanced, for two reasons: 
it is motivated by a physical setting, and it requires the use of a 
trigonometric function. In Tartaglia’s abstract, algebraic prob- 
lem, we are to divide 8 into two parts so that the product of 
their product and their difference is maximized. Thus, if the 
parts are x and 8—x, then we are to maximize x (8 — x)[x — (8 — 
x)] = —2x? + 24x? — 64x, where of course 0 < x < 8. Tartaglia 
almost certainly structured the statement of this problem with 
the intent of arriving at a cubic; see my book An Imaginary 
Tale: The Story of /—1 (Princeton University Press 1998), for 
Tartaglia’s part in the history of the cubic equation. He did not 
reveal his method of solution, but he did publish the correct 
answer: x = 4(1+1//3) = 6.309401 (see figure 3.1). For how he 
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might have reasoned, see V. M. Tikhomirov, Stories about Max- 
ima and Minima [translated from the Russian] (The American 
Mathematical Society 1990, pp. 37-39). 


A painting is hung flat against an art museum wall, with its bot- 
tom and top edges at distances a and J, respectively, from the floor. 
That is, the vertical dimension of the painting is (6 — a). The paint- 
ing is viewed by a tourist whose eye level is distance h from the 
floor, where h < a. That is, the picture is hung high on the wall to 
avoid the front of a crowd of tourists from blocking the view of those 
in the back. How far from the wall should a tourist stand to max- 
imize the viewing angle subtended at his eye by the painting, i.e., 
so that the painting appears as large as possible? Figure 3.2 shows 
the geometry of the problem, and introduces our notation. (Note 
that the figure shows that the condition h > a leads immediately— 
by inspection—to the “uninteresting” result that, to maximize his 
viewing angle, the tourist should stand at x = 0, i.e., with his nose 
pushed hard into the painting! The geometry says the “viewing an- 
gle” is 180°, but it seems clear he would not enjoy the view.) 

Mathematically, our problem is simply that of determining the 
value of x that maximizes the angle 6 = a — B. Today this prob- 
lem is popular with the authors of calculus textbooks, but the year 


Pee picture 


tourist's eye Lee “oct “T b 


FiGurE 3.2. Regiomontanus’ hanging picture (one-dimensional). 
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1471 was still a couple of hundred years short of the beginnings of 
the differential calculus. That’s why the original solution (it is not 
known if it is due to Regiomontanus himself) was in the form of a 
complicated (in my opinion) geometric construction; you can find 
it discussed in the book by Ivan Niven, Maxima and Minima without 
Calculus (The Mathematical Association of America 1981, pp. 71- 
72). What I’ll show you here, instead, is a very clever noncalculus 
solution that is only slightly more general than the one given in 
Eli Maor’s Trigonometric Delights (Princeton University Press 1998, 
pp. 46-48). Later, in section 3.5, I’ll make the solution a bit more 
realistic (and complicated, which will require a computer to give us 
numerical results). 

The essential idea is to maximize tan(@) = tan(a — B), which 
is equivalent to our problem of maximizing 6 = a — 6 since the 
tangent function monotonically increases as its argument increases. 
We begin, then, with the trigonometric identity 


tan(a) — tan(B) 


tan(@) = tan(a — B) = i tanta any 


where, from figure 3.2, we have 


b—h 
tan(a) = ——— 
x 
a—h 
tan(B) = ——. 
X 
SO, 
b-h a-—h 
Bee es Ae = pn ee 
tan(@) = b-h a-h ~~ x24 (b—h)(a—h) 
J 
X 
(b — a)x 


~ x24 (b—h)(a —h) 
To maximize tan(@), and hence 6 itself, I’ll use the trick of minimiz- 
ing its reciprocal, i.e., let’s examine the function 
1 x*+(b-h)a—-h) x | (a—h)(b—h) 
tan(0) (b — a)x  b-a x(b —a) 


and ask for what value of x do we have a minimum? 


MAXIMIZATION AND SOME MODERN TWISTS 75 


To answer that question, recall the AM-GM inequality. For any 
two positive numbers, y, and y2, the AM-GM inequality says y; + 
y2 > 2,/y1 y2, with equality iff y; = y2. Thus, setting 


ee: 
y= haa 

_ (a—h)(b—h) 

—  x(b—a) 
we have 

| = x || Pe" | =2 eaten: 
tan(0) b—a x(b —a) (b — a)? 

with equality iff 


x  — (a—h)(b—h) 
b—-a  x(b—a) 


That is, 1/tan(@) is never less than the constant [2/(b — a)] 
/ (a — h)(b — h) and is equal to that constant iff x = /(a — h)(b —h). 
This value of x minimizes 1/tan(@) or, to say the equivalent, max- 
imizes tan(@), which means @ itself is maximized. For this value of 
x, the value of tan(@) is (using the expression for tan(@) given in the 
previous paragraph) 


(b—a)/(a —h)(b—h) _ (6-a)J(a —h)(b —h) 
(./(la — h)(b — hy)’ + (b—h)(a —h) 2(a — h)(b — h) 
b—a 


~— 2 (a — h)(b —h) 


So, the answer to the Regiomontanus problem is that the tourist 
should stand away from the wall by the distance 


x = J(a—h)(b—h) 


and, at that distance, he will experience the maximum viewing 
angle of 


b-—a 
2./(a —h)(b—h)}- 


Ge i = tan! | 


76 CHAPTER 3 


A special, amusing case of interest is that of the “bug’s-eye view,” 
with h = 0. Then, 


x =~vab 
6 = tn" | > 
max a) 7 ? 


For example, if we have a large painting hung such that a = 8 feet 
and b = 20 feet, then a bug on the floor should position itself at a 
distance of 


V8 -20 ft = 12.65 ft 


and, at that distance from the wall, it will enjoy a viewing angle of 


20-8 
— 25.4°. 
2/160 


As a less whimsical example, suppose a six-foot-tall adult comes 
to the museum with his three-foot-tall child. To maximize their 
individual views of that same painting, each will of course stand 
at a different distance and, perhaps even more interestingly, each 
will experience a significantly different maximized viewing angle. 
So, the optimal viewing distances for each are 


adult: (8 — 6)(20 — 6) ft = 5.29 ft 
child: (8 — 3)(20 — 3) ft = 9.22 ft 


while the maximized viewing angles for each are 


6 ax = tan” 


20 —8 

adult: Omax = tan! |= | = 48.6° 
o 2/28 
20 —8 

child: @max = tan! | = 3301". 
= 2/85 


The adult sees a nearly 50% larger painting (in the vertical direction) 
than does the child. 

An amusing little twist on these calculations appeared in the May 
1984 issue of the American Journal of Physics. There, as a challenge 
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problem for readers, the question called for the calculation of the 
distance a man (wearing trousers of length 2) should stand away 
from a dressing room mirror to have the best view of his trousers, 
if his eyes are distance h above the floor. There was no historical 
discussion given, but you can now see that it is just a slight variation 
on the original Regiomontanus problem. It was solved in the AJP 
using calculus, but it is easily handled with the AM-GM inequality, 
just as in the last analysis. See if you can do it (the answer is at the 
end of this chapter). 


3.2 The Saturn Problem 


A very interesting, somewhat more complicated, variation on the 
original Regiomontanus problem is the not so well known Saturn 
problem. It doesn’t yield to the AM-GM inequality, but we will still 
be able to find a pretty solution. For this new problem, the viewer 
is imagined to be on the surface of a (spherical) planet that has a 
ring—which, for the solar system, of course means Saturn. If we 
further imagine that we measure latitude upward from the plane 
that contains the ring (see figure 3.3), then the latitude @ increases 
from 0° in the ring plane up to 90° at the geographical north pole. 


location of ring observer 


FiGurE 3.3. Geometry of the Saturn problem. 
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If we denote the radius of the planet by r, and the outer and inner 
radii of the ring by a and BD, respectively, then our problem is to find 
that value of a at which the observed angular width 6 of the ring 
is maximum. This value, of course, actually determines two circles 
of latitude all around the planet that give the maximum viewing 
angle, one above the ring plane (as shown in figure 3.3) and one 
symmetrically positioned below the ring plane. 

The geometry of the Saturn problem is really quite straightfor- 
ward. In the notation of figure 3.3 we have, from a triple application 
of the law of cosines, 


(a — b)* = 07 + £5 — 26) £ cos(@) 
L; =r*+a’—2ra cos(a@) 


0? =r? +b? — 2rb cos(a). 


So, 
0? + 2 — (a — by? 
6) = 1 2 

cos(@) ny 

_ 1? +b? — 2rbcos(a) +r? + a* — 2racos(a) — (a — b)? 

7 20125 

7 2r* +.a* +b? — 2rcos(a)(a + b) — (a? — 2ab + b?) 

i 205 

_ 2r? — 2rcos(a)(a+b)+2ab _ r*+ab—r(a+b)cos(a) 

7 20 £5 = e5 
Since 


€;l5 = ,/{r? + a? — 2racos(a)} {r? + b? — 2rbcos(a)}, 
then we have 


' r2 +ab —r(a+b)cos(a) 


S88 a 
{r? + a? — 2racos(a)} {r? + b? — 2rbcos(a)} 


This expression for the angle subtended by the ring at the ob- 
server’s eye makes sense, of course, only as long as @ is such that the 
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entire ring is visible. Too large an a would require “looking through 
the ground” to see the inner edge of the ring. We can calculate the 
maximum value of a at which the inner ring edge is still visible by 
observing that at that a (call it a), the line-of-sight from the sur- 
face of the planet to the inner ring edge is tangent to the surface 
of the planet. That is, the radius to the location of the observer is 
perpendicular to €;, and so 


‘ r 
cos(@) = b 


OI 


The case of Saturn (with the values r = 56,900 km, a = 138,800 km, 
and b = 88,500 km), gives us 


56,900 
7 = cos ! = 49,99°, 
ae ee 


and so we need consider only the values of 6 that occur for the 
interval 0° <a < 49.99°. 

And by “consider” I mean that this formulation of the problem 
literally demands a computer analysis. That is, let’s simply plot 6 as 
we let a vary from 0° to 49.99°. If there is a maximum for 6@ in this 
interval (where the entire ring is visible) we’ll see it in the plot. This 
is an approach not easily available to precomputer-age analysts, of 
course, but today it requires only a little time and effort with the 
aid of a personal computer. The program I used took five minutes to 
write (I used MATLAB, but the code is so simple it is just as easy 
to do in any other language), ten minutes to type, and just one 
second to execute (on an 800-MHz machine). The result, after a total 
of 88,486 floating-point arithmetic operations, is figure 3.4, which 
shows that 6 does, indeed, have a rather broad maximum around 
6 = Omax = 18.44°, which occurs at a = 33.5°. 


3.3. The Envelope-Folding Problem 


For our next problem, on how a computer can play a highly useful 
role in minimization analyses, consider the following problem that 
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6, (angular width of the ring) in degrees 
r=) 


0 5 10 15 20 25 30 35 40 45 50 
a, (latitude angle) in degrees 


FiGureE 3.4. Observed angular width of Saturn’s ring versus latitude. 


appears to be deceptively simple. We are given a right triangle, with 
perpendicular sides of lengths a and b meeting at the corner O, as 
shown in figure 3.5. Suppose we fold the right angle over to place O 
at some point P on the hypotenuse. This can be done in an infinity 
of ways (with the folded triangle’s sides OX and OY having lengths 
x and y, respectively, and such that 0 < x < a,0 < y < b). Each 
such way results in the folded triangle OYX having some area; our 
question is: what is the minimum possible area of OYX? This sounds 
like a simple question, but I don’t think it is. If you don’t agree, then 
shut the book right now and try your hand at it before you read what 
follows. 

To start, let me make some elementary but crucial geometric ob- 
servations. When we fold O onto P we create an image triangle 
(YPX, in dashed lines) that is a copy of the actual, folded triangle. For 
example, the angle YPX is a right angle because angle YOX is a right 
angle. Similarly, angle OYX equals angle PYX (called 6), and angle 
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FIGURE 3.5. Geometry of the envelope-folding problem. 


OXY equals angle PxXY (called a). The other angles, called wu), uo, vy, 
and v2, are as shown in the figure. (Angle YOP = u,; = angle YPO be- 
cause, by construction, triangle YOP is isosceles.) Finally, the dashed 
line segment OP (with length 2h) is, of course, bisected by YX be- 
cause the triangles YPX and YOX are identical. Most importantly, 
V] = v2 = 90°, i.e., the line segment OP is perpendicular to the line 
segment YX. This last claim may or may not be obvious (try folding 
some paper triangles—that’s what I did!), but it is easy to formally 
establish. That is, 


0+ u; + v2 = 180° (triangle OYC) 
and 
O6+u; +v; = 180° (triangle PYC), 


and so v; = v>. But, since v; + v2 = 180°, we have v; = v2 = 90°, as 
claimed. Also, 
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20 + 2u; = 180° (triangle OYP) 


and so 6 + u; = 90°, or u; = 90° — 6. But as u; + uy = 90°, then 
uz = 90° — uy, and so uz = 8. All of this is straightforward, almost so 
simple you might wonder why I’ve bothered to spell it out. Here’s 
why. 
We have the area of the folded triangle as 
1 


ha aay 
ied 


where 
h h 
—=cos(?) and — =sin(@). 
x y 


Thus, 


h2 
A = ——————_.. 
2cos(@) sin(@) 
Next, using the law of cosines repeatedly on various triangles in 


figure 3.5, we can find h (and thus A) as a function of just 6. 
In the notation of figure 3.5, we have 


? = (2h)? + a? — 2(2h)a cos(6), 


or 
Also, 

3 = (2h)? + b* — 2(2h)b cos(90° — 8), 
OT 


| (2 = 4h? + b* — 4hbsin(@). | (2) 
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And, 
(2h)? = ¢2 + b? — 2€yb cos { tan! (= )| | 
or, as 
b 
tan-! (= eee 
cos | an (5) Ip 
we have 
i= po? Z (3) 
: Ja? + b2 


With one more application of the law of cosines, we have 


b 
(2h)? = e? + a* — 2£,acos {tan (2) : 
a 


or 
(4) 


Finally, we get our last equation from the Pythagorean theorem: 
(0) + £2)" =a’ +b*, 


OT 
€) +2) = Va2+b?. (5) 


Substituting (1) into (4) gives 


[x2 b2 
“= ee [2a* — Sha cos(4) | : 
a 
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while substituting (2) into (3) gives 


JSq2 + b2 
ly = eS [2b? — 4hb sin(a)]. 


Substituting these two results for £€; and £2 into (5S) then gives us h 
as a function of 0: 
_ | 
— 0 in(@) ]’ 
5 a ) n sin( , 
a b 


and so, at last, we have the area of the folded triangle as 


ee 
~ §cos(8) sin(6)[a sin(@) + bcos(6)]2 


Setting dA/d@ = 0 to find the minimum of A is a nasty business (try 
it!), and sol won’t do that. Instead, let’s use a computer to study the 
behavior of A directly. 

We know, physically, that the extrema (minimum) of A occurs 
somewhere in the interval 0° < @ < 90°. Not all values of 9 in this in- 
terval are possible, however, because we must satisfy the constraints 
of 0 < x <aand0O < y < b. The maximum value of 6 occurs when we 
fold the entire length a up onto the hypotenuse (again, fold some 
actual paper triangles if this isn’t clear). Thus, 


b 
20max + tan~! (2) = 180° (triangle OPB), 
a 


1 b 
nem = 90° yee tan7! (2) . 
2 a 


The minimum value of 6 occurs when we fold the entire length b up 
onto the hypotenuse. Thus, 


OT 


2 (90° — Onin) + tan™! (=) — 180° (triangle OPA), 


OT 
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Area of folded triangle 


20 25 30 35 40 45 50 55 60 65 70 
folding angle 8, in degrees 


FiGurE 3.6. Area of the folded triangle, versus the folding angle a = 1, b = 1. 


Figures 3.6 and 3.7 show A(@) plotted over the interval Onin < 6 < 
Omax for the cases of a = b = 1 anda = 2, b = 1, respectively. For the 
first case we get the obvious (by symmetry) result that Amin = 0.125, 
at 6 = 45°, and in the second (not so obvious case) the answer is 
Amin = 0.2144. 


3.4 The Pipe-and-Corner Problem 


Suppose we want to transport a long, cylindrical pipe through one 
underground tunnel (of width a) into another tunnel at a right angle 
to the first tunnel, all the while keeping the pipe horizontal. We 
imagine that during the move the pipe pivots on the tunnel corner 
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Area of folded triangle 


30 35 40 45 50 55 60 65 70 75 80 
Folding angle 8, in degrees 


FiGureE 3.7. Area of the folded triangle, versus the folding angle a = 2,b = 1. 


(point A) of figure 3.8, and also that it always slides along the left 
wall of the first tunnel (moving point B). Our question is: how wide 
must the second tunnel be to allow the pipe to be so moved? This is a 
popular textbook problem in introductory calculus courses, where it 
is invariably simplified by reducing the pipe’s diameter to zero, i.e., 
by imagining in figure 3.8 that the pipe is a line segment of length £ 
and outside diameter w =0. That makes it easy to find the maximum 
value of y, i.e., the maximum extension of the pipe across the second 
tunnel, which, of course, thus determines the required minimum 
width of the second tunnel for the move to be physically possible. 

Far more realistic, however, is to allow the pipe to be able to 
have something inside of it, i.e., to have a nonzero diameter! From 
the geometry shown in figure 3.8, we have (in the notation of that 
figure) 


dj+d=— 2, 
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wsin(0) a — wsin(6) 


FIGuRE 3.8. Geometry of the pipe-and-corner problem. 


as well as 
W 
y — 
ae = sin(@) 
d, — w tan(@) 
and 
= 1 a) 
a — wsin(@) — cos(@). 


d2 


Solving these last two expressions for d; and d2 and then substituting 
into the first expression, we can solve for y as a function of 0 (the 
pivot angle): 


y = €sin(@) — atan(@) + Or 
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It is physically obvious that there will be some unique angle 
9 = 6, between 0° and 90°, at which y will attain its maximum 
value. To determine 6 analytically we could, as taught in freshman 
calculus (and as discussed in the next chapter), set the derivative 
of y with respect to 6 equal to zero and solve for 6. If you try this, 


however, you'll get 
£cos° (6) —a+w sin(6) = (. 


which is not easily solved analytically for 6. We could, of course, 
just plot the left-hand side of this expression and observe where the 
curve crosses the 9-axis, and then plug that value for 6 back into the 
y-equation, but why bother? If we are going to use a computer to 
plot a curve then why not just use it to plot the y-equation itself and 
directly observe the maximum of y? And that’s just what I’ll do. 

Since a is a “natural” dimension of the problem, let’s actually 
study the so-called normalized equation 

W 
y & 


£\ e 
a (=) sin(@) — tan(@) + cos(6). 
That is, we will find the maximum of y in units of a, given both the 
pipe length and the pipe’s outside diameter also in units of a. For 
example, if the pipe is 100 feet long, with an outside diameter of 
one foot, and if the first tunnel has a width of a = 10 feet, then our 
normalized equation becomes 


y 
= 10sin(@) — tan(@) + 10 cos(6)’ 
This equation is plotted in figure 3.9, which shows that y/a has a 
maximum value of 7.168. Thus, the second tunnel must be at least 
71.68 feet wide. 

If we had used the simple textbook model with w = 0, however, 
we would have calculated 


6 =cos"! {| = cos”! {(0.1)'”?} = 1.0881 radians, 


which, when substituted back into the y-equation, gives the smaller 
result 
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Pipe extension into second tunnel 
Bey 


0 10 20 30 40 50 60 70 80 
Pivot angle @ (in degrees) 


FiGuRE 3.9. Turning a nonzero-diameter pipe around. 


Ymax = 100 sin(6) — tan(6) = 69.49 ft. 


Is this two-foot difference significant? Ask yourself that question 
the next time you try to move a (nonzero width) couch around a 
hallway corner from one room to another—half-an-inch (much less 
two feet) too little in the hallway width will ruin your day (I speak 
from experience!). 


3.5 Regiomontanus Redux 


Purists may not like the use of a computer to solve extremal prob- 
lems, preferring pure mathematical demonstrations. They claim that 
while the sheer brute power of a modern computer may be sufficient 
to show some premise is either true or not true, such “calculate to ex- 
haustion” demonstrations don’t show why the conjecture is true or 
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false. For example, the use of Eratosthenes’ sieve to find the primes 
is perfect for use on a computer and yet nobody would claim to 
say it tells us, at some deep level, why there is an infinity of primes 
(and the sieve certainly doesn’t tell us anything about the still open 
question of the infinity—or not—of the twin primes). 

I expect that new generations of mathematicians will be able 
to expand their list of acceptable tools (which once included just 
the straight edge and the compass) to routinely include computers. 
Indeed, two famous extremal problems of mathematics have already 
yielded to computer analysis in the last quarter of the twentieth 
century. Thomas Hales (of honeycomb conjecture fame, mentioned 
in chapter 2) showed (in 1998, with the aid of enormous computer 
support) that the Kepler Sphere Packing Conjecture (dating from 
1611) is true; face-centered cubic packing of identical spheres (the 
way Oranges are displayed in pyramids in grocery stores) gives the 
maximum packing density. And Wolfgang Haken (1928- ) and 
Kenneth Appel (born 1932 and now my colleague at the University 
of New Hampshire) at the University of Illinois showed (in 1976 
and with the help of a huge computer program) that the Four-Color 
Conjecture (dating from 1852) is true: to color any planar map so 
that countries sharing a border have different colors requires, at 
most, four colors. 

While the Regiomontanus problem, and its Saturn variant, are 
both clever and distinct in nature from the isoperimetric problems 
of the ancients, they too were initially treated with geometrical 
thinking. That was because the development of calculus, the next 
great step forward in the methods of extremal analysis, still had a 
century to wait. Many students today associate only the name of 
Newton with that development (or perhaps that of Leibniz as well, 
if they’ve heard a bit of history in their math or physics classes). In 
fact, it was the French lawyer and amateur mathematician Pierre 
de Fermat who took the first step toward introducing analytical 
techniques to extremal problems, where once only geometry was 
the means of attack. In the next chapter, then, Fermat and his work 
will take center stage. But, before Fermat, let’s take one last look 
at the Regiomontanus problem and the use of a computer. The 
approach I’ll use here is based on ideas presented in a letter by A. Tan 
and O. Castillo in the October 1983 issue of Mathematics Teacher 
(“Maximizing Paintings,” p. 472). 
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The analysis of section 3.1 was literally one-dimensional, with 
the “painting” reduced to merely having a vertical dimension. A 
real painting, of course, also has a width, as shown in figure 3.10. 
Notice carefully that in that figure I have changed the symbols to 
be in agreement with Tan and Castillo. The painting’s dimensions 
are now a and BD, and the bottom edge of the painting is distance c 
above the eyes of the viewer. The viewer is imagined to be standing 
directly in front of the center of the painting, at a distance x from 
the vertical wall on which the painting is hanging. 

As Tan and Castillo point out, what we really want to do is maxi- 
mize the solid angle subtended at the viewer’s eyes by the painting. 


t—— 
a ee 
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~// dy 
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FiGurE 3.10. Regiomontanus’ hanging picture (two-dimensional). 
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This will require us to evaluate a double integral, a big step beyond 
anything done so far in this book; you should definitely consider 
what follows, then, as optional reading (at least for now). If we locate 
an arbitrary patch of differential area of the painting, located at co- 
ordinates (y, z) as shown in figure 3.10, then that differential area is 

= (dy)(dz) and the distance of dA from the viewer’s eyes is, by a 
double application of the Pythagorean theorem, r = (x*+y?+z7)!/. 
Now, if @ is the angle made by the viewer’s line of sight to dA, then 


cos(é) = — 
and the (differential) solid angle subtended at the viewer’s eyes by 
dA is 
dAcos(0) — x dA _ x (dy)(dz) 


pe gd $y? 4 2297? 


dQ = 


We get the total solid angle subtended by the entire painting by 
integrating over all y and z that define the painting’s extent, and so 


dy)(d 
a= ff Pe i ir eee ps 
entire zZ=c y=—b ieee 1 ,) 
painting 
The actual details of doing the integrations are routine but 


lengthy and a bit messy (a good table of integrals is the “method” | 
used!), and so I’ll simply quote the result: 


Q = sin™’ (6° =x‘) x? + @ + 0)"] — 2x%8! 
(b? = x?) ie +(at c)?] 


— | (b? — x*) (x? + c?) — 2x*b? | 
— sin ne a 


(b? + x?) (x? +c?) 


We could play with this, algebraically, to get a somewhat simpler 
appearing expression (indeed, Tan and Castillo’s formula for Q is a 
bit less intimidating), but I’m not going to bother. After all, what 
we would do next, analytically, would be to set the derivative of 
(2 with respect to x equal to zero and solve for x. Even with Tan 
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and Castillo’s expression (and certainly with mine) that proves to 
be an astonishingly ugly business! Even with their slightly less aw- 
ful formula, Tan and Castillo were still forced to conclude “The 
value of x at which Q maximizes can [read that as must!] be found 
numerically.” 

In figure 3.11, I have plotted Q versus x for a particular Regiomon- 
tanus problem considered in section 3.1: a bug on the floor viewing 
a painting with its lower edge 8 feet above the floor and its upper 
edge 20 feet above the floor. (I first compared the results my expres- 
sion gives for the x that maximizes Q with the numerical results 
given by Tan and Castillo’s expression for the examples treated in 
their analysis, and they agree exactly.) In the notation of figure 3.10, 
then, we have a = 12 feet and c = 8 feet. Assuming a square paint- 
ing (b = 6 feet), the plot shows Q is maximized when the bug is 
x = 8.58 feet from the wall, considerably closer than the value of 
x = 12.65 feet found when the painting was modeled as having only 
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FIGURE 3.11. Maximizing a bug’s solid viewing angle. 
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a vertical dimension. It is only for very wide paintings (b — oo) that 
the solid-angle solution approaches the solution for the case of the 
one-dimensional painting. 


3.6 The Muddy Wheel Problem 


For the final problem of this chapter, consider the geometry of figure 
3.12, showing an event that a multitude of medieval mathemati- 
cians must have observed countless times: a wagon wheel rolling 
through a muddy street. The wheel, with radius R, is thickly coated 
with mud, and the rim is continually throwing off mud from every 
point. Our question here is: what is the maximum height above the 
ground reached by the ejected mud? The answer would be of consid- 
erable interest to those sitting in the wagon! To solve this problem, 
I’ll use mathematical methods and physical arguments unknown to 
any medieval mathematician. How those methods came to be de- 
veloped will be the central concern of the next two chapters; seeing 


vcos(a) Vv 


Rsin(a) 


FIGURE 3.12. Geometry of the muddy wheel problem. 
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how neatly and elegantly these methods make short work of the 
muddy wheel problem will graphically illustrate how very different 
are the worlds of medieval and modern mathematics. 

The maximum height of the tossed mud obviously depends on 
the speed at which the wheel rotates, and so let’s say that all points 
on the rim are moving a steady speed of v. Thus, when mud comes 
off the rim, it is moving at speed v tangent to the rim, at the instant 
it leaves the rim. I think it is clear that, for the mud that reaches 
the maximum elevation above the road, it must come off the rim 
somewhere between the points marked A and B in the figure. So 
suppose, as shown, that the radius from the center of the wheel to 
some point in that quarter-circle makes angle @ with the horizontal. 
The vertical component of the ejected mud’s speed, at the instant of 
ejection, is thus v cos(q); it is of course the vertical component only 
of the mud’s speed that is responsible for the height reached by the 
mud. At the instant of ejection (let’s call that instant time ¢ = 0), the 
mud is therefore already at a height of R+ Rsin(q@) above the road. 

Now, as the ejected mud rises, its vertical speed is continually 
reduced by gravity, which is given by vcos(a) — gt, where g is the 
acceleration of gravity. The mud’s height above the ejection point is 
the integral of its speed, i-e., it is given by v cos(a)t — 4gt*. The mud 
reaches its maximum height, by definition, when its vertical speed 
has been reduced to zero. Thus, if tf = T is the time it takes to reach 
that maximum height, we have 


vcos(a) — gT = 0, 
or 


v 
T = —cos(q@). 
§ 


And so the maximum height above the ground reached by the mud, 
h(a), is 


l 2 
h(a) = R+Rsin(a) +0 cos(a) — cos(a) — 5 cos’(q@), 
8 & 


Or 


2 
h(a) = R+Rsin(a) + = cos’ (a). 
8 


96 CHAPTER 3 


To find that a that maximizes h(a)—let’s call the maximum H— 
we set dh/da = 0 and find that 


2 
Rcos(a) — = cos(q) sin(a) = 0, 
8 


OT 


R 
sin(a) = — 
v 
This, of course, makes sense only if Rg < v* (because |sin(a)| < 1 
for all real aw) and so, for now, let’s assume this condition is satisfied. 
(’ll return to what Rg > v* means at the end of this section.) So, 


since we also have 


R292 
cos*(a) = 1 — sin*(a) = 1- =, 
Vv 
then 
2 Dod 2 2 2 
gov g Rg ov R*g 
H=R+R——-+— — — —— + — — —.,, 
" m3 ( | - vy? oy 2v2 
or 
2 R*g 
H=R+—4—— 
+5 T 332 


Since we are assuming that v? > Rg, we see that our result says 


H> p+ R84 RY _op 
7 2g ve 


That is, as long as v* > Rg, then mud will always rise to at least 
a height even with the top of the wheel (and, of course, the more 
v’ exceeds Rg, the more above the wheel will mud be flung, perhaps 
onto the clothing of those riding too near to the sides of the wagon). 

But what if v? < Rg? Then our condition of sin(a) = Rg/v* is 
impossible to satisfy; so, let’s return to the expression for h(a), to 
just before we differentiated it. Then, 


2 
= gale 
h(a) =R 1 + sin(a@) + IRe Cos | : 
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and so 
I ) 
h(a) < R}1+sin(a) + 5 cos“(@) |. 


Defining f(a) = 1 + sin(@) + 5 cos*(@), we have 
h(a) < Rf(q@). 


Since 


a = cos(a@) — cos(a@) sin(a) = cos(a)[1 — sin(a@)], 
and since sin(a) < 1 for all a, and since cos(@) > 0 for 0° < a < 90° 
(aw never equals 90°, as we have v* < Rg, not v* < Rg), then 

at. a: 0° <a < 90°. 

da 
Since the derivative of f(a) is the slope of the tangent line to the 
f(a) versus @ curve, then this result says f(a), over the semiclosed 
interval 0° < a@ < 90°, approaches a maximum as @ approaches 90°. 
That is, over that interval f(@) approaches a maximum value of 2 as 
a approaches 90°. Thus, when v’ < Rg, the value of H is strictly less 
than 2R (the top of the wheel). Mud coming off the wheel right at 
the top of the wheel (with no vertical speed) automatically achieves 
the height of 2R, but certainly no mud is ever flung above the wheel 
if v? < Rg. 

As late as the start of the seventeenth century, there were no 
mathematicians on earth who could have done this analysis. At the 
end of the seventeenth century, there were many. What happened 
during that century—that advanced the mathematics of extrema in 
such a revolutionary way—is the central topic we take up next. 


Solution to the Problem in Section 3.1 


The answer to the trousers/mirror version of the Regiomon- 
tanus problem is that the man should stand at a distance of 
5J/h(h — £) from the mirror. (The key observation is that the 
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(continued) 

man’s trousers appear as far behind the mirror as he stands in 
front of it). This is the correct mathematical result, but does it 
really “make sense”? For example, I wear 31-inch trousers and 
my eyes are 70 inches above the floor. But, when I try trousers 
on at my local men’s clothing store, I stand significantly far- 
ther away from the dressing room mirror than + /70(70 — 31) 
= 26.1 inches. Apparently simply maximizing the viewing an- 
gle does not really capture what is meant by “best view.” 


4. 


The Forgotten War of 


Descartes and Fermat 


4.1 Two Very Different Men 


Modern students, when first introduced to the differential calculus, 
learn that it was the simultaneous and independent creation of the 
Englishman Isaac Newton (1642-1727) and the German Gottfried 
Leibniz (1646-1716). Perhaps they are told that Newton and Leibniz 
(and their respective followers) engaged in a lengthy and acrimo- 
nious debate over intellectual priority, and that Newton continued 
the battle even after Leibniz’s death, right up to the day of his own 
death. Almost certainly, however, they are not told anything about 
an equally nasty war of words between two French mathematicians 
a half-century earlier, on some of the same issues that later engaged 
Newton and Leibniz. 

Pierre de Fermat (1601-65) and René Descartes (1596-1650) were 
very different men. Fermat was a family man, trained as a lawyer 
who loved mathematics as a pastime, and who so valued his privacy 
that he published very little (and even then, only anonymously). He 
was a Classical scholar as well, fluent in Italian, Spanish, Latin, and 
Greek, and an omnivorous student of the writings of the ancient 
mathematicians. Descartes, also trained as a lawyer, was a man who 
soon came to embrace public acclaim, who published widely, and 
who devoted his whole life to the single-minded pursuit of abstract 
knowledge. A family would have been a distraction, and Descartes 
never married (although in 1635 he did have a daughter by one of 
his servants). 
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Unlike Fermat, Descartes gave the impression that he was often 
uninformed of what others had done before him; at least he only 
rarely mentioned the work of anybody else in his writings. And 
when he did, it was often in the most unpleasant manner one could 
imagine: at various times in his life he called his critics “two or three 
flies,” “less than a rational animal,” “a little dog,” and “extremely 
contemptible.” The actual works of others were often rejected in 
incredibly offensive language, e.g., as being fit only for use as “toilet 
paper” or, in the case of Fermat, as being “shit.” 

We remember both men for very different reasons than what they 
fought over: Descartes for his philosophical writings and the joining 
of algebra with geometry into analytic geometry, and Fermat for 
his work in probability and number theory, particularly the famous 
and only recently resolved “Fermat’s Last Theorem.” What these two 
brilliant intellects battled over, however, was none of this, but rather 
first a problem in physics, and then the beginnings of how to answer 
extremal questions through analysis rather than the classical tool of 
geometry. 

The origins of the conflict between the two men can be found in 
Descartes’ essay on optics, La Dioptrique, one of the appendices in his 
1637 book Discourse on Method. There he treated the phenomenon of 
the refraction of light, which is the next natural question to pursue 
after noticing the details of the reflection of light. (Descartes’ interest 
in the law of refraction was also motivated by his research into the 
nature of the rainbow, which he—and others before him—correctly 
believed to be due to the scattering of sunlight by water droplets in 
the air. Descartes needed the law of refraction to mathematically de- 
scribe that scattering, and I’ll return to the rainbow problem in the 
next chapter.) Both phenomena, reflection and refraction, involve 
extremal arguments of a quite different nature (and the solution to 
one of the first problems in the calculus of variations—discussed in 
chapter 6—used the refraction law), so let me make a brief digression 
to describe them. 


“And God said, ‘Let there be light’; and there was light... . 
But we can imagine the angelic architect asking for more de- 
tails: ‘What path shall light follow in going from P to Q?’ 
And the answer might have been, ‘Don’t bother me with such 
details. See that it makes the trip in minimum time.’ From this 
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minimal principle one finds that for reflection the angle of inci- 
dence should equal the angle of reflection, while for refraction 
at an interface the ratio of the sine of the angle of incidence to 
the sine of the angle of refraction must equal the ratio of speeds 
in the two media.” 
“And God saw that the light was good.” 
—Arthur Bernhart, Scripta Mathematica 1959, p. 206. 


Professor Bernhart might have mentioned, however, that there 
are two ways to form the ratio of the speeds; Descartes got it 
wrong, but Fermat got it right as you’ll see in what follows. 


4.2 Snell’s Law 


It was Euclid who first made note (three centuries before Christ) of 
the now familiar reflection law of light: if a beam of light is sent 
toward a mirror, then the angle of incidence equals the angle of 
reflection (0; = 9, in figure 4.1), not only for a flat mirror but for 
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FiGurE 4.1. Geometry of Heron’s reflection law. 


102 CHAPTER 4 


a curved mirror as well (for curved mirrors we measure the two 
angles with respect to the tangent line at the point of reflection, 
R). It was Heron of Alexandria, however, who first observed (in 
the first century A.D., in his book on mirrors, Catoptrica) that the 
reflection law is the immediate consequence of assuming that the 
beam path ARB is the minimum length path. That is, if the point 
R on the mirror were such that 6; £4 9,, then the resulting total 
path length would be longer. (The implicit assumption is, of course, 
that the beam of light does reflect off of the mirror—the absolute 
shortest path from A to B is simply the direct, straight line segment 
joining the two points. Indeed, if a light bulb is at A, broadcasting 
light in all directions, then B receives light along the two paths ARB 
and AB.) Heron’s observation is the first occurrence of a minimum 
principle in mathematical physics; such principles play central roles 
in modern theoretical physics. It is impressive and instructive to 
examine how Heron derived the reflection law from this particular 
minimum principle. 

If the destination point B is distance d above the mirror, then 
B’s reflected point (B’) is distance d “below” the mirror. RB and RB’ 
are, therefore, the equal-length hypotenuses of two congruent right 
triangles, which means 67 = 6, (referring again to figure 4.1). Now, 
the total light path length is AR + RB = AR + RB’, and this last 
sum is the path length from A to B’. The shortest path from A to B’ 
(and so the shortest length for the reflected path, as well) is along a 
straight line, and so 6 = 6;, which immediately gives 6; = 6,, i.e., 
the reflection law. 

With the reflection law thus established, attention turned next to 
refraction, the phenomenon of the change in direction experienced 
by a beam of light when it crosses the interface between one trans- 
parent medium into another (from air into water or into glass, for 
example), as shown in figure 4.2. Attempts to formulate a mathe- 
matical description of refraction can be traced as far back as Ptolemy; 
the first preliminary mathematical results appeared in the German 
astronomer Johannes Kepler’s 1611 Dioptrice, but it wasn’t until the 
experimental work of the Dutch physicist Willebrord Snel (1580- 
1626) that the precise form of the refraction law was discovered. If 
we measure the angles 6; and 0, with respect to the interface nor- 
mal (the dashed line in figure 4.2), then Snel discovered around 
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medium 1 


media 
interface 


medium 2 


FiGurE 4.2. Geometry of Snell’s refraction law. 


1621 (but didn’t publish) what is now called Snell’s law (the double-| 
spelling is from the Latinized form of his name, Snellius): 


sin(6; ) 


: = “constant”, 
sin(6,) 


where the “constant” is a function of the nature of the two media. 
Snel observed that if medium 2 is denser than medium 1 (as with 
a light beam traveling from air into water) then the “constant” is 
greater than one. That is, sin(@;) > sin(6,) or, equivalently, 6; > 
6,; i.e., upon entering the water the light beam bends toward the 
normal. 

Snel’s notes on his experiments were lost some time after 1662, 
and what we know of them is only through the writings of those 
who saw them. Somewhat less well known, unpublished experimen- 
tal research leading to Snell’s law, years before Snel, is also attributed 
to the English mathematical physicist Thomas Hariot (1560-1621). 
Hariot died from a cancer of the nose—due to a youthful intoxica- 
tion with tobacco while serving as the science officer on a colonizing 
expedition to Virginia in 1585(?)—and his scientific research had 
ceased by 1618, three years before Snel’s research. History records, 
however, that it was Descartes who first published, in La Dioptrique, 


104 CHAPTER 4 


a theoretical derivation of the law of refraction, as well as offering an 
explanation of the “constant.” Many historians have long believed 
that Descartes learned of Snel’s experimental work and then used 
that knowledge to guide the often strained physical assumptions of 
the nature of light that appear in his analysis. Other historians dis- 
agree but, for a while at the end of the 1600's, there were nasty rum- 
blings in the scientific community about plagiarism on Descartes’ 
part. Descartes’ former admirer, Christiaan Huygens, who as a young 
boy met Descartes often when the Frenchman visited Huygens’ fa- 
ther, was among those who suspected the worse. The matter is still 
not fully resolved. 

It is Descartes’ “derivation” of Snell’s law that Fermat read in 
1637 and found lacking in merit, and he said as much in a letter 
to a correspondent who also had contact with Descartes. Fermat’s 
skeptical reaction soon got back to Descartes, and the war was on. 
So, how did Descartes derive Snell’s law, and why did Fermat think 
that derivation wrong, a view shared by all modern physicists since 
the middle of the nineteenth century? 

Adopting a particle view of light, Descartes began his analysis 
by making an analogy with a tennis ball hitting a cloth barrier at 
incident angle 6;. He argued that the ball would lose some of its 
speed in the vertical direction only, because the cloth would offer 
no resistance to the ball in the horizontal direction. The horizontal 
speed component, therefore, would be unchanged. To express this 
claim mathematically, let the ball’s speed before hitting the cloth 
barrier be v; (as shown in figure 4.3) and v2 after penetrating the 
cloth. Then, the ball’s horizontal component of speed above the 
cloth (in medium 1) is v; sin(6;), which is, according to Descartes, 
also the horizontal component of speed below the cloth (in medium 
2). Since that component is v2 sin(6,), then 


v; sin(O@;) = v2 sin(6,), 
or 


sin(6; ) ) 
= constant = —. 
sin(0, ) Vi 


The first equality is Snell’s law (the ratio of sines is a constant), but 
Descartes’ derivation has actually led him into serious difficulty. Snel 
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medium 1 


sR cloth 
! —» v,sin(6,) barrier 


medium 2 


FIGuRE 4.3. Geometry of Descartes’ refraction law “derivation.” 


had not explained what the constant actually is, while Descartes had 
apparently shown it is the ratio of the ball’s speeds in the two media. 
If we stick with the tennis ball analogy to light (i.e., a particle or so- 
called corpuscular view of light), with v2 < v; (the ball loses speed as 
it penetrates the cloth barrier), then Descartes’ version of Snell’s law 
must be wrong since it says 


sin(6; ) 
sin(0,) 


<li, 1.e., 6; < 6,. 


That is, the ball (particle of light) would veer away from the normal 
as it enters a denser medium. As mentioned before, however, light 
is observed to do precisely the opposite. To bring his result into 
agreement with experiment, then, Descartes had to drop the ball 
analogy in midstream and conclude that v2 > vw, i.e., that light 
speeds up as it penetrates the barrier. Thus, Descartes was forced 
to conclude that the speed of light is greater in denser media. In 
Descartes’ day there was no experimental measurement of the speed 
of light in any medium, and so no one could say whether he was 
right or not. And experiment is, of course, the only way to really 
settle such a question—unless you are a philosopher. To “explain” 
how light gains speed as it passes from air into water Descartes put 
forth arguments in the tradition of Aristotle (“physics the way we 
think it should be, rather than what experiment shows it to be”), 
arguments that today seem ludicrous. 


106 CHAPTER 4 


Descartes’ view of space was that there is no empty space (for 
him a vacuum was impossible), and that all apparently empty space 
between macroscopic bodies was actually filled with an invisible 
“something.” This view was long in dying, and right up to the end 
of the nineteenth century all physicists—until Einstein—believed 
that light needed that “something” to move through. They called 
it the ether and, while not Descartes’ term, it represented his view. 
Descartes also believed that light was a “pressure” that traveled in- 
finitely fast (some historians claim he really only asserted it trav- 
els very fast, not infinitely fast) through the “something,” a view 
that would be logically at odds with an assertion that light travels 
faster in water than in air. Both Fermat and Descartes were dead be- 
fore the first experimental measurement of the finite speed of light 
was made (an astronomical experiment in 1675 by the Danish as- 
tronomer Olaus Roemer, based on the timing of Jupiter’s eclipsing of 
its moons). And it wasn’t until almost another two centuries later that 
the speed of light in water was measured to be less than that in air (a 
terrestrial experiment in 1850, by the French physicists Hippolyte 
Fizeau and Jean Foucault). If one assumes that light is the wave phe- 
nomenon pioneered by Huygens, rather than a particle one, then 
the slowing of light in water allows one to derive Snell’s law of re- 
fraction (See any college freshman physics textbook); indeed, the 
Fizeau/Foucault result was interpreted as proof that beams of light 
are waves, not particles. Things are actually not quite that simple, 
but that is an issue for a book on quantum electrodynamics! 

You can find more on Descartes and his flawed optical physics, at 
a fairly technical level, in the paper by W. B. Joyce and Alice Joyce, 
“Descartes, Newton, and Snell’s Law” (Journal of the Optical Society 
of America, January 1976, pp. 1-8), and at the historian’s level in 
the books by William R. Shea, The Magic of Numbers and Motion: The 
Scientific Career of René Descartes (Science History Publications 1991) 
and A. I. Sabra, Theories of Light: From Descartes to Newton, (Cam- 
bridge University Press 1981). The second book, in particular, details 
the bitter feelings Descartes had toward Fermat because of Fermat’s 
rejection of Descartes’ “derivation” of Snell’s law. You can find more 
on Descartes’ flawed physics, in general, in Herman Erlichson’s “The 
Young Huygens Solves the Problem of Elastic Collisions” (American 
Journal of Physics, February 1997, pp. 149-54). And finally, you can 
find a very detailed description of Hariot’s ingenious experiments 
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on refraction in John W. Shirley, “An Early Experimental Determi- 
nation of Snell’s Law” (American Journal of Physics, December 1951, 
pp. 507-8). 

When Fermat read La Dioptrique he was unimpressed and, as men- 
tioned earlier, was blunt in his criticism. He wrote, in part, “of all the 
infinite ways [to analyze the motion of light] the author [Descartes] 
has taken only that one which serves him for his conclusion; he has 
thereby accommodated his means to his end, and we know as little 
about the subject as we did before.” An uncharitable reading of this 
might be that Descartes knew what the answer must be—from his 
earlier knowledge of Snel’s experimental work—and so he simply 
fiddled with his physical assumptions until he got what he knew 
experiment said he had to get. In other words, Descartes’ so-called 
derivation of Snell’s law of refraction was no more than a begging of 
the question. (When I was a college undergraduate, the writing of 
a made-up lab report for a missed chemistry experiment was called 
a “dry lab,” as compared to actually doing the experiment and get- 
ting real data, which was, of course, a “wet lab.” Fermat thought 
Descartes’ “derivation” to be a dry lab!) Further, Fermat rejected as 
nonsense Descartes’ assertion of the infinite speed of light and his 
subsequent illogical argument that light travels faster (than infin- 
ity?) in water than in air. Fermat’s position was that light traveled at 
a (very fast) finite speed in air, and that it was slowed when traveling 
through a denser (“more resistive”) medium such as water. 

Fermat initially believed that, since Descartes’ derivation was 
clearly (to Fermat) built on sand, then the “ratio of sines is a con- 
stant” result must be incorrect. Eventually Fermat learned that the 
formula was, in fact, generally accepted as true because it could be 
verified by direct experiment! This greatly puzzled Fermat; how had 
Descartes managed to derive the correct law of refraction from er- 
roneous arguments? It became a quest for Fermat to find a phys- 
ically correct derivation of the law of refraction; he believed that 
the law would be mathematically different from Descartes’ ratio of 
sines result while also being able to give nearly the same numerical 
results, thus explaining the (coincidental) experimental agreement 
with Descartes’ result. With Fermat’s subsequent great discovery of 
the “principle of least time” (discussed later in this chapter) his quest 
ended in 1658 with success—but with a surprising twist that aston- 
ished Fermat. 
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To carry out the calculations involved in applying the principle of 
least time to the refraction of light, Fermat used new mathematical 
techniques of his own devising, including almost what is now called 
the derivative of a function. Stimulated by an observation due to 
Kepler (see the box at the end of this section)—at an extrema, either 
minimum or maximum, a function f(x) is not changing as tiny 
changes are made in x—Fermat transformed Kepler’s insight into 
mathematics. Completed by 1629, Fermat published his discoveries 
in 1637 as Method for Determining Maxima and Minima and Tangents 
to Curved Lines. The date is important, as Descartes saw it just after 
learning of Fermat’s rejection of La Dioptrique, and so, Descartes be- 
ing Descartes, replied in kind to Fermat’s work. (Descartes also saw a 
challenge in Fermat’s Method to yet another of Descartes’ appendices 
to his Discourse; Descartes thought his Geometry did what Fermat 
claimed to do, only better, with his—Descartes’—mathematics.) 

The irony in all of this is delicious. Descartes rejected Fermat’s 
work on maxima and minima largely because Fermat had rejected 
Descartes’ derivation of the law of refraction. Then Fermat used his 
maxima/minima technique to correctly derive the refraction law, as 
well as giving a proper explanation to the “constant” in Snell’s law. 
We'll take up Fermat’s mathematics for our next discussion. 


When engineers and scientists think of Johannes Kepler 
(1571-1603) it is almost certainly in connection with his fa- 
mous three laws of planetary motion. An often ignored aspect 
of his genius, however, is his contribution to the early devel- 
opment of the differential and integral calculus. What is par- 
ticularly amusing about this is what motivated Kepler in those 
mathematical researches; shortly after his second marriage in 
1613, while setting up a new household, he learned how wine 
merchants determined the “volume” of wine barrels. They sim- 
ply stuck a rod in through a hole at the edge of the top lid 
and measured the length of the barrel diagonal from top to 
bottom, without regard to the actual shape of the barrel. This 
made no sense to a man with Kepler’s mathematical ability, 
of course, and he began to think upon the question of just 
how one would compute the volumes of various barrel shapes. 
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Kepler published the results of his work in the 1615 book Stereo- 
metria doliorum vinariorum (New Solid Geometry of Wine Barrels). 
One result is particularly interesting for us: of all cylinders with 
the same diagonal, the one with the maximum volume is the 
one in which the ratio of the diameter to the height is /2 (a 
rather squat barrel resembling, in fact, the storage tanks used 
to hold oil in petroleum refineries). This result is worked out in 
the next chapter as an example of the new calculus of Newton 
and Leibniz. 


4.3 Fermat, Tangent Lines, and Extrema 


Recall the problem we solved back in chapter 1, using the method of 
completing the square: how should a constant C be divided into two 
parts so their product is maximized? There we wrote the two parts as 
x and C — x, and their product as M = x(C — x). Fermat solved this 
same problem with a new approach, as follows. Expanding, we have 


x7 -Cx+M—O. 


Solving for x gives 


CtiJ/C*-—4M 
5 


x= 


He next argued that if M is the maximum product then there should 
be just one value of x that achieves that maximum. Thus, the quan- 
tity under the square root sign must be zero. So, 


and 


Thus, x = $C gives the maximum product. 
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FiGureE 4.4. Constructing a tangent to a parabola. 


Fermat also applied his “double-root” idea to the problem of 
drawing tangents to a given curve, at a given point. For example, 
consider the parabola x = —y* shown in figure 4.4. Suppose the 
given point is B, with coordinates x = —s,y = ./s,s > 0. The 
generic equation of the tangent line is, of course, the well-known 
equation for a straight line 


y=mx+b, 


where m is the slope and b is a constant. Now, this straight line 
intersects the parabola at the solutions to 


x = —(mx + b)*, 


which is easily solved to give 
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— —(2mb + 1) + JV (2mb + 1)? — 4m?b? 
a 2m? 


Fermat next invoked his central argument, that there is only one 
actual intersection of the tangent line with the curve, and so it must 
be true that 


(2mb + 1)* — 4m*b* = 0. 
This is easily solved to give 


| 

4m 
Therefore, at the given point B we have (since B is on the tangent 
line) 


which is again easily solved to give 


m= — 


] | 
Ss (ana sob= v5). 


Thus, the equation of the tangent line is 


y= —5 a x $5 = ae s > 0. 

To actually draw the tangent line, it is sufficient to locate point E 
in figure 4.4, the intersection point of the tangent line with the x- 
axis. Setting y = 0 then, the x-coordinate of FE is x = s. So, Fermat’s 
procedure for drawing the tangent line at any given point B on the 
parabola x = —y” is the following four-step process: 


1. drop the perpendicular from B to the x-axis, to the point C in 
figure 4.4. 

2. measure the length CD = s, where D is the coordinate system 
origin. 

3. DE = 5, too, thus determining E. 

4. connect B and E with a straight line. 
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Fermat later altered his “double-root” argument into what is 
nearly the modern approach for finding the extrema of a function 
by setting the first derivative to zero. That is, returning to the exam- 
ple that started this section, suppose x is the value of x that gives 
the maximum product, and that E is a “very small” quantity. Then 
using x or x + E should give nearly equal results, i-e., 


2 -—~CR+M~(£+E) —C(R+E)+M, 


and the near-equality will become a true equality as we let E — 0. 
So, expanding and canceling equal terms on both sides, we arrive at 


0 + 2kE + E* —CE. 


Since we haven’t yet let E go all the way to zero, we can divide 
through by E to get 


Ox 2x+E—-C. 


This division by E is crucial, of course, because otherwise as we let 
E — 0 we would get nothing but the undeniably true (but not very 
interesting and certainly not useful) tautology of 0 = 0. But, if after 
the division we let E — 0, we get the equality 


0O= 2x —C. 
Thus, as before, ¢ = $C. 


As another example of this technique, Fermat showed how to find 


the extreme value of the more complicated function f(x) = ax?—x?, 


where a is a given constant. As before, let x be the value of x that 
gives the extreme value of f, and let E be “very small.” Then, 


f(x) ® f(x + E), 
or 
f(x) — fx + E) © 0, 
with the near-equality becoming an equality as E — 0. So, 


[ax* — ¥°] — [a(& + E)? — (&+ E)’| ~0, 
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or, after expanding and canceling, and dividing through by E, 
— 2ak — aE* + 34° +3£E + E* ~0. 
Then, letting E — 0, the near-equality becomes an equality and 
— 2ak + 3x? = 0 = £(-2a + 3%). 


There are two solutions: x = 0 and x = 2a/3. Notice that 


and 


So, with reference to figure 4.5, what these results say is that x = 
2a/3 gives a local maximum if a > 0 (in the left plot, where a = 3 and 
x = 2) and a local minimum if a < 0 (in the right plot, where a = —3 
and x = —2). The x = 0 solution gives a local minimum if a > 0 
and a local maximum if a < 0. For both x values, no matter what 
the sign of a, there is no absolute or global minimum or maximum 
since f(x) becomes unbounded as x — oo. 

The reason for emphasizing that the extremas of f(x) are local is 
that extrema are completely distinguished by the behavior of the 
function in the neighborhood of the extrema. It is entirely possible, 
for example, to have a two-extrema function with its local minimum 
larger than its local maximum. An example of this is shown in 
figure 4.6, for the function f(x) = x + (1/x) (which is, of course, 
discontinuous at x = 0). The local minimum at x = +1(f(41) = 2) 
is larger than the local maximum at x = —1(f(—1) = —2). 

As amuch more complicated example of his method, Fermat also 
treated a geometric problem from Pappus’ Mathematical Collection 
(Proposition 61), one that leads (in modern algebraic notation) to 
calculating the minimum of a ratio of two polynomials: 


(a—x)(b+ x) 
x(c —x) 


9 
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f(x) = 3x2 - x3 f(x) =—3x? — x3 

16 
2 \ 14 
0 12 
—2 10 
—4 8 
z +6 Z 6 
-8 4 
—10 | 2 
—12 | ) 

—-14 —2 \ 
-~16 -4 
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x x 


FIGURE 4.5. Extrema. 


where a, b, and c are given constants. This is not an easy problem! 
You can find the original geometric statement in Alexander Jones’ 
translation of Book 7 of the Collection (Springer-Verlag 1986, p. 186.) 


4.4 The Birth of the Derivative 


It is obvious, at this point, that in his examples Fermat was essen- 
tially calculating the limit that we call today the first derivative, 


, faAt+E)—fx) df | 
im —————— = — = 
E—0 E dx 


f'(x), 


and then setting it equal to zero. (Today’s textbooks commonly use 
e, or Ax, rather than E, in this definition.) This definition was not 
formally introduced into mathematics until 1817, by the Czech 
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FIGURE 4.6. A function with relative minimum > relative maximum. 


mathematician Bernard Bolzano (1781-1848), but the idea was in 
Fermat’s work long before 1817. Indeed, it was in print in Fermat’s 
1637 Method, five years before Newton was born and nine years 
before Leibniz’s birth (the two men normally credited with the in- 
vention of the differential calculus). This tells us, for example, that 
the derivative of any constant is zero, since df = 0 (constants don’t 
change!) There were, of course, others who also contributed to the 
concept of the derivative, e.g., the Dutch mathematician Johann 
Hudde (1628-1704), who in 1659 showed how to differentiate a 
polynomial of any degree and so how to find its extrema. An out- 
standing historical exposition on the evolution of the derivative is 
the paper by Judith V. Grabiner, “The Changing Concept of Change: 
The Derivative from Fermat to Weierstrass” (Mathematics Magazine, 
September 1983, pp. 195-206). Grabiner starts off with the wonder- 
ful observation “The derivative was first used; it was then discovered; 
it was then explored and developed; and it was finally defined.” 
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As a more sophisticated example of the use of the limit definition 
of the derivative, which will provide us with a result we’ll use in the 
next section, suppose we know how to differentiate some simple 
function of x, called g(x). For example, if g(x) = x or x*, then the 
derivative is 1 or 2x, respectively (both results follow easily from 
the derivative definition—try it!). Then, our question is: what is the 
derivative of h(x) = /c + g(x), where c is a constant? 

Using the definition of the derivative, we have 


on VET 8E HE) = VE F 8) 


dh _ h(xt+te)—h(x) 
im —H———_ = li 


dx E—0 E E—>0 E 
Now, 
dg ( g(x + €) — g(x) 
— = him — 
dx ée—>0 E 


or, approximately, if ¢ is not equal to zero but “very close” to zero: 


dg 
E— + g(x) © g(x +6). 
dx 


Thus, 
dg 
c+ g(x) +e— — Jet g(x) 
dh dx 
— SS a — ——— —_——_$&_£ ee, 
dx E> 0 E 
E dg 
(ose) eb ae Sa Ce) 
c+ g(x) dx 
= lim 
E—0 E 
d 
[it ae = 
E->0 E 


Next, and finally, using the approximation /Il +u ~ 1+ su for u 
“small,” we have our result: 


a 
dh 2ce+e(x)dx 
ets ee ate 
dx E—>0 


E 
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JeeaGy- ] dg 1 l dg 
= a/c 1.) SS See 
: 2 ct+tg(x)dx 2 VJS/c+g(x) dx 


The modern notation for the derivative, e.g., dx/dt and d*x/dt? 
for the first and second derivatives of x(t) with respect to ¢ (time), is 
due to Leibniz. In Newton’s notation they would be written as x(t) 
and x(t), respectively. Newton’s dot notation is still used today, but is 
generally regarded as less useful. Leibniz’s notation lends itself to the 
useful device of thinking of the differentials dx and dt as algebraic 
quantities, and to treating them as such. For example, a little later in 
the next section I’ll formally derive what is called the chain rule, but 
in Leibniz’s notation (and not in Newton’s) it is trivially obvious: if 
u(t) and v(t) are two functions of the independent variable r, and if 
f(@t) = u{v(t)}, then 


df du dv 
dt dv dt 


“because” we can cancel the two dv differentials on the right-hand 
side. 

Even Newton’s name for the new math has been discarded. Find- 
ing his original motivation in considering how quantities change 
with the “flux of time” (in his Principia he writes of time as flow- 
ing), Newton called x(t) a flowing quantity, or fluent, and the rate 
at which x(t) changes with time (that is, the derivative of x(t)) the 
fluxion. The use of the word calculus, rather than Newton’s “method 
of fluxions,” is again due to Leibniz from some time before 1680. 
Newton himself had adopted Leibniz’s term by 1691. 

Much of the failure by Fermat to receive credit for his wonder- 
ful discovery is almost certainly due to the criticisms of Descartes, 
who simply failed to appreciate what he read in Method. This isn’t to 
say all mathematicians failed to appreciate Fermat’s contributions to 
the invention of the differential calculus. The Italian-born French 
mathematician Joseph Lagrange (1736-1813), who developed the 
modern approach to the calculus of variations (see chapter 6), wrote 
“One may regard Fermat as the first inventor of the new calculus.” 
And the French genius Pierre Simon de Laplace (1749-1827) de- 
clared “Fermat should be regarded, then, as the true discoverer of 
Differential Calculus.” Modern historians disagree, however, argu- 
ing that Fermat’s calculations are quite limited in scope, while New- 
ton and Leibniz developed calculus in breadth; in particular, they 
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discovered general formulas for the differentiation of complicated 
functions. Still, when Lewis Trenchard More published his 1934 
biography Isaac Newton (Charles Scribner’s Sons) he announced 
(p. 185) that he had discovered, in the major archival holdings of 
Newton’s papers, a previously unknown draft of a letter in which 
Newton himself stated his debt to Fermat for the invention of the 
differential calculus: “I had the hint of this method from Fermat’s 
way of drawing tangents and by applying it to abstract equations, 
directly and indirectly, I made it general.” 

To see why Newton wrote those words, consider again the para- 
bola x = —y? shown in figure 4.4. Recall that Fermat took the 
tangent at B as the dashed line through B that intersects the x-axis 
at E. Dropping the perpendicular from B to the x-axis (to C), then, 
reduces the problem of drawing the tangent to determining just 
where E is located, i.e., to determining the length of CE. Fermat’s 
ultimate method for doing this (developed after his “double-root” 
approach) was to take O as an arbitrary point (between B and E) 
on the tangent line and then dropping the perpendicular from O 
to the x-axis (to /). Point A is the intersection of this perpendicular 
with the parabola. From the equation of that curve (remember, D is 
the origin) we have the distance relationships 


CD = (BC)* 
ID = (Al)’, 
and so 


(BC)? _ CD 
(Al)? ID’ 


Since OI > Al, then 


(BC _ CD 
(O12 ~ ID’ 


By similar triangles we also have 
BC Ol 
CE IE’ 
or 
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(BC)* | (CE)? 
(OI)? IE)?” 


Thus, 


(CE)* _ CD 
(IE) ~ ID’ 


Now, let CD = d, CE = a, and CI = e. Since B (and so C) is given, 
Fermat knew the value of d. The value of a is what Fermat wanted 
to calculate, while the value of e is variable as it depends on the 
choice for O (which determines 7 and so C/). In any case, we have 
ID = CD — CI =d —e, and IE = CE — CI =a —e, and so 


a* d 
————$——$—$$$— < == 
(a—e)*? d—e 


or 
a’(d —e) <d(a— e)”. 
With a little algebra this becomes 
2ade < a’e + de’. 


To complete his argument, Fermat let O move ever closer to B, 
and this would of course move / ever closer to C, and so e — 0. But, 
before doing that, e 4 0, and so we can divide by e to get 


Jad <a’ +de. 


Then letting e — 0 transforms the inequality into an equality (obvi- 
ous from the geometry of figure 4.4) and so 2ad = a”, or d(= CD) = 
5a(= CE). This is, of course, the same result he obtained from 
the double-root method, but this is the technique that so inspired 
Newton in his development of the differential calculus. 

It didn’t inspire Descartes, however, who thought the approach 
not to be general. He believed it would work only if an explicit 
relation of the form y = y(x) could be written. In what he thought 
would convince Fermat (and others) that Fermat’s method wouldn’t 
be able to handle a curve more complicated than a mere parabola, 
Descartes challenged (in 1638) Fermat to apply it to the curve x* + 
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y>? = 3axy, where a is a given positive constant. Notice that the 
x and y cannot be separated in this equation into the form y = 
y(x). It is amusing to learn that, in addition to Descartes’ failure to 
correctly draw his own curve, Fermat was able to quickly determine 
the tangent to the curve (now known as the “folium of Descartes.”) 


4.5 Derivatives and Tangents 


The intimate connection between the derivative of a function f(x) 
and the tangent to the curve y = f(x) was used by Newton to solve 
the practical problem of calculating the roots to the equation f(x) = 
0. As is now well known, if f(x) is a polynomial of degree greater 
than four then there is no analytic solution, in general. What is 
called “Newton’s method” is an iterative, numerical technique (see 
the next box) that can find the solutions to f(x) = 0 quickly, to any 
degree of accuracy desired, even in cases where f(x) is a polynomial 
of infinite degree, e.g., f(x) = x — cos(x). Newton wrote up his 
discovery in 1671, as part of his book Methodus fluxionum et serierum 
infinitarum, but it wasn’t actually published until 1736. Meanwhile, 
in 1690, the English mathematician Joseph Raphson (1648-1715) 
published the same method in his Analysis aequationum universalis. 
In modern calculus textbooks, this method (easily programmed on 
a computer) is often called the Newton-Raphson method in honor 
of both men. 

To understand the geometry behind the Newton-Raphson method, 
let’s consider the continuous function f(x) = x* — 2x —5, the same 
function used by Newton in his Method of Fluxions to illustrate the 
method. It is easy to calculate that f(2) and f(3) have opposite 
algebraic signs, and so there must be some value of x = x (between 
2 and 3) where f(x) = 0. Figure 4.7, which plots f(x), shows that 
the value of x is actually between 2 and 2.5, but suppose we want to 
find x much more precisely—e.g., accurate let’s say, to ten decimal 
places? How can we do that? 

The Newton-Raphson method generates a sequence of values, x, 
forn = 1,2,3,---, that approach x, i.e., limp_..9X, = x. That is, 
given the value x,, the method then calculates x,,, that is closer to 
X; |Xe41 — X| < |x, — X|. The Newton-Raphson method can then 
use xx41 to calculate x,42, and so on, until we have the accuracy we 
desire. Here’s how it works. 
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FIGURE 4.7. Newton’s function. 


The derivative of f(x) at x = x, is f’(x,), which is the slope of 
the line tangent to y = f(x) atx = x,. This tangent line thus has 
the equation 


y= f'Gn)x +b, 
where b is a constant. But, since y = f(x,) atx = x,, then 
fn) = f'Gn)xn + b, 
and so b = f (x,) — f’ (%n) Xn. Thus, the tangent line has the equation 
y= f'Xn)x + fn) — fn) en. 


This tangent line crosses the x-axis (and so y = O) at x = x,,4), Our 
next (often better, although not always—see figures 4.8a and 4.8b) 
approximation to x. Thus, 


0 = f'n) Xn41 i Ff (Xn) _ F Qa) Xn. 
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(a) f(x) 


Newton-Raphson converges 


actual root to f(x) = 0 


tangent line Ss 


(b) f(x) 


Newton-Raphson does NOT converge 


tangent linen -Y, ad 


FiGurE 4.8. Geometry of the Newton-Raphson method. 


or, solving for x,4;, we have our result: 


f (xn) 


Xnt+1 = Xn — f'n) 
For Newton’s example, f’(x) = 3x” — 2 and so the iterative algo- 
rithm for solving f(x) = 0 is 
t= 2p 5. 2h 425 
Xntl = An 7~ TQ ACU UBT 


3x2 —2 3x2 —2 
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All we need now, to use this algorithm, is a “starting value” for the 
sequence of x,, i.e., the value of x9, the obvious choice for which is 
2. Subsequent values generated by the algorithm are 


i= 2.1 

X2 = 2.09456812110419 
x3 = 2.09455 148169820 
X4 = 2.09455 148154233 
x5 = 2.09455148154233, 


and so, after just four iterations we have the value of x to better than 
ten decimal places. The Newton-Raphson method itself is nothing 
but arithmetic, but it is fundamentally based on the connection 
between the derivative of a function and the tangent line (at a given 
point) to the curve determined by that function. 


Relatively recent scholarship, I should tell you, convinc- 
ingly argues that neither Newton or Raphson should have this 
method named after them! The method I just illustrated is both 
iterative and employs the derivative concept. Newton’s own, 
specific calculation of the solution to the cubic has neither 
feature, and Raphson’s method does not use derivatives (al- 
though it is iterative). It was actually the English mathemati- 
cian Thomas Simpson (1710-61) who published the modern 
algorithm in 1740. For more on this interesting story, which 
has not yet (as far as I know) been incorporated into modern 
textbooks on the history of mathematics, see Nick Kollerstrom, 
“Thomas Simpson and ‘Newton’s Method of Approximation’: 
An enduring myth” (The British Journal for the History of Science, 
September 1992, pp. 347-54). 


Fermat came as close as one could to discovering the derivative 
without actually making the discovery. An “infinitesimal miss,” yes, 
but for the credit of being declared the inventor of the differential 
calculus it made all the difference in the world. He could have taken 
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the final step, too, as I'll illustrate in the next section on how Fermat 
finally constructed a proper derivation of Snell’s law. 

As one last example of the connection between derivatives and 
tangents, consider the problem of calculating the derivative of the 
function f(x) = In(x). Using Fermat’s idea, let’s write 


In x + Ax 
df _  In(x + Ax) — In(x) x 
— = hm —_———_ =_ lim —— 
ax Ax—>0 Ax Ax—0 Ax 


| Ax Ax \ 1/4 
= lim —-In{1+—1] = lm In{ 1+ — . 
Ax>0 Ax x Ax-—>0 xX 


Recall now that lim;_,..(1+(a/s))* = e*. If you don’t recall this, there 
is a nice noncalculus derivation of it, using the binomial theorem, in 
Eli Maor’s e: The Story of a Number (Princeton University Press, 1994, 
p. 35.) So, with s = 1/Ax, and a = 1/x, we have 


d ey 1/x\° 
— |n(x) = lm In ( + =) = lim In ( + =) 
ax Ax—0 X S—>0O S 
] 
=In(e!”) = — 
(e's) =? 


Figure 4.9 shows plots of In(x) and 1/x, and it is immediately obvious 
that 1/x does indeed “look like” the slope of In(x). I’ll use this result 
in the opening section of the next chapter to answer a famous 
“puzzle problem” in mathematics. 

One of the most valuable of the differentiation rules tells us how 
to differentiate what are called composite functions. For example, we 
just learned what the derivative of In(x) is, but what is the derivative 
of In{v(x)}, where v(x) is any function of x, not simply v(x) = x? 
What, for example, is the derivative of In{In(x)}? The very definition 
of the derivative is the key to answering this. So, suppose u = u(x) 
and v = v(x), and that we already know how to differentiate u(x) 
and v(x), individually. We can find the derivative of u{v(x)} by first 
writing 

du — uUu(vt+ Av) —u(v) du v(x + Ax) — v(x) 
— = im ————— _ and — = hm ——RHH—. 
dv Av—0 Av dx Ax—>0 Ax 
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FicurE 4.9. The natural log function and its derivative. 


Also, 
—uf{v(x)} = lim utv@a + Ax)} — utu@)} 
Ax-—0 Ax 
fy ee) 
~ Axo0 u(x + Ax) — v(x) Ax 


Now, by definition Av = v(x + Ax) — v(x), and so v(x + Ax) = 
v(x) + Av, which means that 
d _ uf{v(x) + Av} —u{v(x)} v(x + Ax) — v(x) 
—uf{v(x)} = kn A i. _ ——_. 
dx Ax—0 Av Ax 


Since Av — O as Ax — O, we thus have 
d _ Uu(v+Av)—u(v) .. v(x + Ax) — v(x) 
—uf{vo(x)} = bn AANA .- Lim 
dx Av->0 Av Ax—>0 Ax 
_ du dv 
~~ du dx’ 
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a result commonly called the chain rule and known to Leibniz no 
later than 1676. I used it in chapter 1 (in the minimum escape 
velocity problem of section 1.6), and [’ll use it in the next chapter 
to solve a famous problem from 1686. 

Now, to answer our original question on how to differentiate 
In{v(x)}, we have u = In(v) and v(x) = In(x), and so 


d d 
— In{v(x)} = — In(v) - — = -—. 
ax v 

For example, if v(x) = In(x), then we have 


l I l 
dx = In(x) x ~ x In(x)’ 


which of course is defined only for x > 1. 
And finally, we can turn all of this on its head and calculate the 
derivative of f(x) = e*. This means x = In f(x), and so 


d d 
ae =l= ee Ok: 


But, our result for composite functions says 


Sint foo) = {Sn cn} 1} = nee 


df f dx 
So, 
i= | df 
of dx? 
or 
d d 
ne Ley eo Se 
dx dx 


The exponential function is its own derivative. 

Two highly useful results that immediately follow from this 
unique property of the exponential are the derivatives of the hyper- 
bolic functions. Thus, with A some constant, if 
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AX —Ax 
f(x) = cosh(Ax) = Coat 
efx _ p—Ax 
g(x) = sinh(Ax) = 5 ; 
then 
d d A Ax __ A —Ax 
- == cosh(Ax) = “= Aa sinh(Az) 
d d A Ax A —Ax 
Ore sinh(Ax) = ie A = A cosh(Ax). 
dx ax 2 


These formulas will be very helpful in chapter 6. 


4.6 Snell’s Law and the Principle of Least Time 


Fermat’s solution to finding a physically correct derivation of Snell’s 
law of refraction was the result of developing a generalization of 
Heron’s derivation of the reflection law. Using Heron’s original mini- 
mum-path-length criterion wouldn’t work for refraction, of course, 
as that path would simply be the straight line connecting A and B 
(in figure 4.10, where A and B have a lateral separation of d), rather 
than the actual broken path ARB. Fermat’s generalization was to 
argue that the correct path, for both reflection and refraction, is the 
path of minimum time. For reflection, where the light is always in 
the same medium, minimum length and minimum time give the 
same path. But for refraction, the paths are different, and the least- 
time path is indeed the actual path. In the notation of figure 4.10, 
then, the total transit time from A to B is 


fhrtx? fhe + (d —x/ 


Vj v2 


The mathematical problem for Fermat was to determine x = x so 
that T is minimized. One of the reasons why Fermat is not recog- 
nized as the inventor of the differential calculus is that he failed to 
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FiGureE 4.10. Geometry of Snell’s law from the principle of least time. 


discover the rules for applying his basic idea of f(x + E) © f(x) for 
E “small” to functions more complicated than simple polynomials, 
e.g., to the square roots that appear in the formula for T. Fermat 
was, however, through some special algebraic manipulations, still 
able to solve this specific problem. Here’s how. 

Let’s start by observing from figure 4.10 that 


0) —_— 
JAy +x? 
d _ 

sin(6,) = a 


he + (d—x)? 


Then, using T(x) — T(x + E) ~ 0 for E “nearly zero,” we have 
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fhe +%2 \/h5 + (d —£)? 
————————. + 
v1 U2 
Jai +(R+EP? Jhh+(d—-%-E/) 
—_ ———_. $$ — 
Vi] v2 


Look now at the square root in the first term in the second pair of 
brackets; we can write it as (since E is “nearly zero” then E? is even 
“more nearly zero”) 


VAT ++ EP = fA +e +2KE+ 2% Shi +K2+2K6E 


Recalling once again the approximation /1+u ~ 1+ 3u for u 
“small,” we arrive at 


xE 
he + Xx+E)x fi + | Vi +. 
i + ( ) h? + x2 


Repeating this process for the second square root (and using 1 — u 
~ 1 — $u) results in 


: (d—X)E ; 
2 et at 2ny ase A re 2 2d ND 
hy + (d—x — E) E ds? wa | Vii + be oes 


We thus have 


T(x)-T+E)= 


fhe +k? s/h? +4 E)? 
ne ee 
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hs + (d — £)? h3 + (d—£— E)? 


hy + (d — %) = 
+ fi - [1 - | 


(d — x)? +h 


U2 
yg t+ d—%) (d—)E [hi + ¥? LE <a 
7 V2 (d — x)? + hé Vy h? + x2 


Next, dividing through by E (which is not yet exactly zero) and then 
imagining E vanishes, we arrive at the equality 


ho + (d—x)? (d —%) 7 h* + x? a 
an h* + £2" 


V2 (d—x)*+ h; Vv} 


Or 


l d-f 1 f 
"ne +(d—x)2 9! ht + #2? 


But this is just 


1. 1 , 
— sin(6,) = — sin(@;), 


U2 v{ 
or, at last, 
sin(6; ) V] 
; = — = constant. 
sin(6,) U2 


I say at last because while this is once again our now familiar Snell’s 
law, now we (Fermat) have the constant right! It is v; /v2, the inverse 
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of Descartes’ result of v2/v;, which had forced him to conclude that 
v2 (= the speed of light in water) > v, (= the speed of light in 
air) because experiment shows the constant in Snell’s law is greater 
than one. For Fermat, however, the conclusion was just the reverse: 
v2 < U}. 

Fermat was both astonished and pleased at this success of his prin- 
ciple of least time, as it simultaneously explained how Descartes’ 
result could be in agreement with experiment and at the same time 
wrong in its conclusion about the speed of light in different medi- 
ums. A modern student would, of course, be perplexed at all of the 
algebra Fermat used. She would wonder at why he hadn’t simply set 
the derivative of T equal to zero to find Snell’s law. The answer is, 
as I mentioned earlier, that Fermat didn’t know how to do that. But 
he was so very close. 

Indeed, the general differentiation formulas for Fermat’s problem 
are not hard to develop and, in June 1682, Leibniz carried out the 
following analysis. Looking at the expression for T, we see that we 
have just two fundamental forms: if c; and c2 are constants, the 
forms are 


g(x) =c; + (cp —x)° 
h(x) = /c+ g(x). 


In section 4.4 you saw how to differentiate h(x). To differentiate 
g(x), we write (using Fermat’s basic idea) in modern notation, 


dg , g(x + €) — g(x) 
= Se ee 
dx E—>0 E 
So, 
Oe [er + {co — (x +8)}7] — [er + (2 — x)’] 
dx — e—>0 gE 


[ey +.c5 — 2co(x +e) + & +6)7] — [er +5 — 2cox + x?] 
a cence ee 
E> 0 3 


: —2ca(x te) +(x +8)? + 20x — x? 
= hn —_— 
E—>0 E 
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—2eox —2ome +x? + 2xe + 0% +20.x — x? 
PA a I A Se a Oe ea 


E—>0 E 
—2 Vs é 
= lim iis ee = lim (—2c2 + 2x + €) 
e—0 E E> 0 
= -—2 (c2 = x) : 


And from section 4.4 we have 


dh | ] dg 


dx 2 J/c+ g(x) dx. 


With these formulas in hand, the modern student would take T, 


written as 
— es h5 + (d — x) 


—_— —- =~ ——_~_ 
v2 


and, as did Leibniz, write (by inspection) 


aT 1 1 ] l ] 

Fs 5 al $5 [2-0] = 0, 
x Vj [h* + x2 U2 /h? = (d—x)? 

OT 


x d—x _ 
v1,/ hy + x? v2,/h5 + (d — x) 
Recalling the expressions for sin(6;) and sin(6,), this immediately 


reduces to Snell’s law, 


sin(6; ) = v1 
sin(O,) v2 


Fermat’s principle of least time does strike many as being out- 
side of mathematics, and perhaps even outside of physics as well; 
as being metaphysical. Of course, Heron’s derivation of the law of 
reflection from the principle of minimum distance is open to the 
same criticism (and, obviously, minimum distance is equivalent 
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to minimum time for travel always in the same medium, and so 
Heron’s principle is simply a special case of Fermat’s). Students al- 
ways want to know how does light “know,” at the start of a journey, 
what path will result in minimum length (time)? That seems to re- 
quire light to be prescient! Before the development of quantum elec- 
trodynamics, which explains how light “knows,” Fermat’s principle 
did have to be taken on faith, and for many that was too much to ask. 
Fermat himself was not sympathetic to those who rejected the least- 
time principle on the grounds that it asked for light to know where 
it was going before it started. As he replied in a (unconscious?) pun 
to one of his critics, “I do not pretend to be in the secret confidence 
of nature. She works by paths obscure and hidden... .” 

In fact, Fermat’s principle of least time is not always correct. The 
modern statement of the principle says the path a light beam follows 
is simply a stationary path, which means that a slight variation in 
the optical path leaves the travel time unchanged. This may indeed 
result in a path with minimum travel time, but another possibility 
is a path with maximum travel time! To see how such a thing could 
happen, imagine a point source of light in the center (point O) of 
an ellipsoidal mirror, as shown in figure 4.11. There are four points 
around the mirror (A, B, C, and D) which reflect light directly back 
to O. Two of them (A and B) determine minimum time paths, while 
the other two (C and D) determine maximum time paths. 

The criticism Fermat received about the principle of least time was 
slight indeed compared to that which descended upon Pierre Louis 
Moreau de Maupertuis (1698-1759) over his so-called principle of 
least action. A number of people, long before Fermat (and probably 


FiGure 4.11. Minimum and maximum time paths are stationary paths. 
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even before Heron) had thought that a universe made by God must 
be a perfect universe, and consequently should always operate with 
economy. Leonardo da Vinci, for example, who wasn’t really even 
a very good mathematician, nevertheless was a thoughtful intellect 
and declared (more than a century before Fermat) that “Every action 
done by nature is done in the shortest way.” He failed, however, to 
explain just what that might mean. Fermat added an explanation 
for the case of light, but Maupertuis went light-years further by both 
defining action and claiming “least action” to be universally appli- 
cable: “in all the changes that take place in the universe, the sum of 
the products of each body multiplied by the distance it moves and 
by the speed with which it moves is the least possible.” He published 
this in 1746, shortly after becoming President of the Academy of Sci- 
ences in Berlin. Least action was later made more precise by such gi- 
ants as Euler, Hamilton, and Lagrange, and it has found enormously 
fruitful applications in such diverse fields as physics (quantum me- 
chanics) and biology (self-regulating, living systems). 

For Maupertuis, however, who seemingly was guided more by 
theological reasoning than by mathematical physics, least action 
brought ridicule down on his head, with the worst of it coming 
from his one-time friend Voltaire. That argument over least action 
became one of the nastiest scientific brawls in history, and it was 
initiated by a claim from a mathematician named K6nig (for more 
on him, see the end of appendix C) that, first, it was wrong, and 
second, that Maupertuis had stolen it anyway from an unpublished 
1707 letter by Leibniz! Euler declared Maupertuis was right, but he 
was no match for the poison-pen of math-illiterate Voltaire; both 
Euler and Maupertuis were the initial losers in this battle. Today 
we better understand who was right and who was not, but that is of 
little consequence for the dead. You can read more about this savage, 
bitter controversy in the essay by Bently Glass, “Maupertuis, Pioneer 
of Genetics and Evolution,” included in Forerunners of Darwin, 1745- 
1859 (The Johns Hopkins University Press 1959). 


4.7 A Popular Textbook Problem 


The refraction of light and Fermat’s principle of least time have 
served as the inspiration for numerous calculus textbook problems 
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Ve ae ee Starting point 


V, (motoring) 


V> (running) 


FiGuRE 4.12. Geometry of yet another minimum-time lake-crossing problem. 


of the following type (illustrated in figure 4.12). A man is in a power- 
boat in a lake, distance d from the nearest point (A) on the shore 
(which is taken to be straight). He wishes to travel, by a combina- 
tion of motoring and running, to point C on the shore. Point C is 
a distance £ from A. That is, he will motor directly to some point 
B on the shore, distance x from A, and then run from B to C. If 
the boat travels at speed v; and if the man runs at speed v2, then 
what is x (where is B?) so as to minimize his total travel time? To 
be as general as possible, we’ll consider both the case of v; < v2 and 
of v; > v2. This problem, even though less sophisticated than the 
superficially similar problem at the end of chapter 1, is worth some 
attention here because it has an easy-to-miss, subtle issue. 
We start by writing the total travel time 7, as a function of x, as 


vedi txt b=x 


VU] v2 


T(x) = 
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Following the standard prescription for finding an extrema (a min- 
imum for T) we set the derivative of T(x) to zero: 


dT 1 1 1 1 
Sea SO): 


SN a te 
dx 2 J/qz4 x2 vi U2 


With just a little algebra this is easily solved to give 


It does seem a bit odd that this formal result for x (the location 
of B) is independent of £, and so we might well wonder if this 
formal result is actually correct. Well, it might be correct—but not 
necessarily! Here’s why. 

If (v;/v2) > 1 (if the boat travels faster than the man runs), then 
there is no real formal solution for x; because the denominator is 
imaginary. The physical interpretation for this case is simply that 
x = f,i.e., the man should motor straight, all the way, to C. This is, 
of course, the obvious statement that if the boat moves faster than 
the man can run, then the shortest total travel time is achieved by 
always traveling at the greater speed along the shortest path (the 
straight line segment joining his initial position directly with C). 

But even if v;/v2 < 1 (and so the formal solution for x is real) 
it is not always the correct solution. This is because it is physically 
obvious that, for any values of v; and v2, x will be confined to the 
interval 0 < x < @. After all, it makes no sense to motor to either 
an x > €oran «x < 0 and then run all the way back to C! Now, x 
is confined to this interval only if 0 < v;/v2 < m, where m is the 
finite value of v1 /v2 that gives x = ¢ (the condition v;/v2 = 0 gives 
the other extreme, so-called end-point value for x, that is, x = 0). We 
can solve for m by setting 
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which gives 


a 7 
aa ee ff J 
V27 max | d 2 
(7) 
So, even if vy; < v2, there is the possibility that the straight path from 
A to C is the minimum-time path. 


The answer to this problem is therefore actually not independent 
of £, as the formal result misleadingly suggests. That is, 


ifO<—< then x = —-~—1_. 
“he® (*:) 
es eed 
v2 
otherwise x = £@. 


The moral is obvious: the solution to a minimization problem may 
be given by the vanishing of a derivative, but then it may also not 
be! This important conclusion is forgotten at the analyst’s peril. 


4.8 Snell’s Law and the Rainbow 


Physicists write Snell’s law in a slightly different manner than we 
have so far used, with c denoting the speed of light in a vacuum: 


sin(6; ) uy c/v2 


sin(@,) v2 c/v, ny 


where n; = c/v; and n2 = c/v2 are called the indices of refraction for 
medium 1 and medium 2, respectively. That is, the index of refrac- 
tion for a medium is simply the ratio of the speed of light in a vac- 
uum to the speed of light in the medium. The usual case is, of course, 
that the index of refraction is a positive number greater than 1. For 
the normal mediums of air, water, and glass, the indices of refraction 
are usually taken to be 1, 1.333, and 1.5, respectively, but these are 
really just typical values. The index of refraction for a given medium 
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isn’t just a single number, but rather is a function of the frequency 
(wavelength) of the light. For example, the index of refraction for 
water decreases with increasing wavelength; in the visible portion of 
the electromagnetic spectrum (the so-called optical region), as light 
varies through the colors violet (“short” wavelength), blue, green, 
yellow, orange, to red (“long” wavelength), the index of refraction 
varies from 1.344 to 1.331. 

The fact that the index of refraction for a medium depends on the 
frequency of the light explains why what appears to be white light 
can be separated by refraction into various colored constituents. 
Each colored component of the total white light experiences a 
Slightly different angle of refraction in Snell’s law, and so is sepa- 
rated from its other differently colored (different wavelength) com- 
panions. This effect, called dispersion, was discovered by Newton 
in his famous glass prism experiment of 1666 (after both Descartes 
and Fermat were dead). 

With Snell’s law written as 


sin(6,) = a sin(6; ), 
n2 


we can see that if nz > n, (as is the case when light, in air, is incident 
on a water surface), then 6, < 6;. That is, the refracted light is bent 
toward the normal. However, since 6, is still positive, the refracted 
light is not bent beyond the normal. The bent light beam travels 
into the water on the opposite side of the normal from the incident 
light. The contrary case, never seen in nature, would mean 0, < 0 
and thus require a negative index of refraction. But would such a 
thing be impossible? 

In the late 1960s, theoretical studies in the Soviet Union showed 
that a negative index medium, while undeniably strange, would not 
violate any of the fundamental laws of physics. In 2001, American 
physicists at the University of California/San Diego actually fabri- 
cated what they call a “structured metamaterial” that, in the micro- 
wave frequency band of 10.2 to 10.6 GHz, has a negative index of 
refraction. This is an extremely high frequency by many standards, 
e.g., the middle of the AM radio frequency band is one megahertz = 
0.001 GHz. Ten gigahertz, however, is a very low frequency com- 
pared to optical frequencies (on the order of 500,000 GHz), and 
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whether or not negative index optical frequency devices can be 
made is still very much an open question. See R. A. Shelby, et. al, 
“Experimental Verification of a Negative Index of Refraction” (Sci- 
ence, April 6, 2001, pp 77-79). 

My reason for getting into the physics of refraction as much as I 
have is that Descartes next used Snell’s law to explain, using a maxi- 
mum argument, the first mystery of the rainbow: why there is often 
a bright, circular arc of light in the sunlit sky after a rainstorm. You'll 
see how he did this in the next chapter, and how calculus (which 
he did not use) is the perfect tool with which to study the rainbow. 
The second mystery of the rainbow (why is it a multicolored arc of 
light, and not just a white arc?) remained a mystery to Descartes 
because he didn’t know about dispersion, and so he used a single 
number for the index of refraction for water (droplets in the sky). 
Descartes did have an “explanation” for the colors, but it is (like 
his “derivation” of Snell’s law itself) physical nonsense. My second 
reason for discussing the physics of the refraction of light is that, in 
1696, the Swiss mathematician Johann Bernoulli used Snell’s law to 
solve a physics minimization problem (discussed in chapter 6) that 
marks the origin of the calculus of variations, the next step up in 
advanced mathematics beyond the calculus itself. 


3. 


Calculus Steps Forward, 
Center Stage 


5.1 The Derivative: Controversy and Triumph 


Starting with Fermat’s near miss of the derivative, and the later 
work by Newton and Leibniz, and others, in developing general 
differentiation formulas, the differential and integral calculus had, 
by 1700, become the mathematics for solving many (but not all, as 
you'll see when we get to later chapters) extrema problems. But not 
everybody was convinced that a quantum leap in mathematics had 
been achieved. As late as 1734, for example, the British philosopher 
George Berkeley (1685-1753) could rightfully pen an attack on the 
logical foundations of calculus, as he did in The Analyst: or a discourse 
addressed to an infidel mathematician. His motivation for this was 
more theological than mathematical, however; appointed a bishop 
that same year, he wrote The Analyst as a rebuttal to those who were 
turning away from the faith and embracing instead the so-called 
rationality of mathematics and science. Bishop Berkeley thought 
that view misguided, writing in his polemic “He who can digest a 
second or third fluxion, . . . need not, we think, be squeamish about 
any point of divinity.” Even more famous is his remark, also from 
The Analyst, which appears to try to tie calculus to the supernatural 
as much as to religion: “And what are the fluxions? The velocities 
of evanescent increments? They are neither finite quantities, nor 
quantities infinitely small, nor yet nothing. May we not call them 
ghosts of departed quantities?” Bishop Berkeley’s hope of showing 
calculus to be fatally flawed failed in the long run, but his criticisms 
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did result in mathematicians returning time and again to the vital 
task of placing calculus on a logically secure foundation. 

Since the start of the eighteenth century calculus has leapt from 
one spectacular triumph to the next, and continues to this day to be 
the rite-of-passage from high school math to the so-called advanced 
maths. Calculus has earned this reputation because of its ability to 
successfully handle problems that, without it, are simply impossible. 
In this chapter I’ll discuss a number of such problems, all mathemat- 
ically interesting, with some also having important historical signif- 
icance as well. So, to start, consider the following freshman calculus 
puzzle that has been known to drive even math professors to despair. 

Imagine you are stranded on a desert island, with only a stick to 
write in the acres of sand that surround you. You certainly do not 
have a table of logarithms or a calculator! If asked “which is larger, 
34 or 4°?”, you would have no problem scribbling the solution in the 
sand with your stick: 34 = 3-3-3-3=81 > 44 =4-4.4 = 64. This 
is easy because 3 and 4 are (small) integers. But what if the question 
is “which is larger, e” or 2°?”? Both e and z are transcendental, and 
that complicates matters (how do you write e z times, or 7 e times?). 
Since both e and z are close to 3 you would probably correctly 
guess that the two expressions have nearly the same value, but that 
doesn’t tell us which is the larger. What to do? With the derivative, 
it is “easy” (it’s always easy, if you think of the right approach). 

Start by defining the function h(x) = (In(x)/x) (thinking of this 
definition is the “hard” part of the problem!). With f(x) and g(x) 
as two functions of x, such that 


one of the fundamental differentiation formulas of calculus tells us 
that 


of 
f yn f[L0} Pa a 
dx | g(x) ) 


dx 7 g°(x) 
For example, since tan(x) = sin(x)/cos(x), and as (d/dx) sin(x) = 


cos(x) and (d/dx)cos(x) = —sin(x), results easily established with 
the fundamental definition of the derivative, we then have 
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d d 
da eae oe ea sin(x) — sin) cos(x) 7 cos?(x) + sin2(x) 


dx cos? (x) - cos? (x) 
] 
cos2(x)_ 


Now, with f(x) = In(x) and g(x) = x, we have from results estab- 
lished in the last chapter that 


d | 
Ay a In(x) —In(x) sx - In(x) 1 — In(x) 
dx x2 > x2 a ce 
Thus, the derivative vanishes (our condition for an extrema) when 
1 — In(x) = 0, i.e., when x = e. But what kind of extrema does x = e 
give us? Is ita minimum or a maximum? We can argue geometrically 

that it is a maximum, as the plot of h(x) in figure 5.1 shows. 


0.4 


In(x)/x 
& 


0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 


FiGURE 5.1. This function has a (broad) maximum at x = e (=2.718...). 
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Geometrical arguments and plots are limited, however, to those 
situations where we can easily see “what is going on” with the 
function of interest. More generally, we need an analytical way to 
distinguish minimums from maximums, and such a way is provided 
by the second derivative. Isaac Newton was the first (1665) to see 
this and, ironically, the basic idea behind this analytical method 
is intuitively obvious if we look at it physically. So, to be specific, 
suppose h(t) represents the height at time ¢ of a ball thrown upward. 
Then dh/dt is the speed of the ball, and dh/dt = 0 simply says 
that the ball has an instantaneous speed of zero at its maximum 
height, i.e., it has stopped moving upward (positive speed) and is 
about to begin its fall back to the ground (negative speed because 
the direction of motion is reversed). 

The second derivative, d*h/dt’, is the rate of change of the speed, 
i.e., it is the ball’s acceleration (due entirely to the force of gravity). 
But that force is always pointed downward toward the center of the 
Earth, opposite to the direction of increasing h(t). Thus, d*h/dt? < 
0, always. This gives us the so-called second derivative test for a 
(local) maximum. If d*h/dt? < 0 when dh/dt = 0, then h(t) has an 
extrema that is a (local) maximum. If d*h/dt* > 0 when dh/dt = 0, 
however, then h(t) has an extrema that is a (local) minimum. 

Bishop Berkeley was, as mentioned earlier, greatly distressed over 
the logical basis of the first derivative; one can easily imagine his 
horror at the second derivative. Indeed, here are his own words from 
The Analyst: “But the velocities of the velocities, the second, third, 
fourth, and fifth velocities, &c., exceed, if I mistake not, all human 
understanding. The further the mind analyseth and pursueth these 
fugitive ideas the more it is lost and bewildered... .” 

In the above discussion, ¢t (time) is the independent variable, but 
that is of no special consequence. The second derivative test applies 
equally well to functions of any independent variable, e.g., to the 
h(x) of our original problem. So, calculating the second derivative, 
we have 


a ar 
Ph” ( | Ul TOME ap Teh Ov) 


dx2 x4 x4 


Thus, 
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d*h _ —3e+2eln(e) -3e+2e 1 
alee ge ee 
That is, x = e is the location of the maximum of h(x), just as 


illustrated in figure 5.1. 

By the very meaning of maximum, any value of x # e, such as 
x = x, will give a smaller value for h(x). Thus, for the h(x) of our 
original problem, 


In(e) 
e€ 


] In(zv) 
= — > , 
e A 


or 
x > eln(z) = In(z’*). 
Thus, 
Bie UO se 


and we are done. In fact, a calculator does confirm that e7= 23.14069 
... iS indeed larger (but not by very much) than w* = 22.45915.... 


The differentiation rule for a quotient also quickly gives us 
the rule for differentiating a product, i.e., the formula for 


d 
pm. u(x)} =? 
Xx 
If we define u(x) = 1/g(x), then we have from before that, 


df dg wo - 52 (2) 
| £0) = Sigs ds _ dx 
dx | g(x) ) 


g 1/u 


Applying the quotient rule to (d/dx)[(1/u)], and remembering 
that the derivative of a constant is zero, we have 
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and so 


d 

(f(x) ue)} = 7. 
_,% 
=" dx +s dx 


By 1677 the rules for differentiating quotients and products 
were known to Leibniz. 

The rule for differentiating a product leads immediately to 
one of the fundamental results of integral calculus: the for- 
mula for integration-by-parts. If we take advantage of Leibniz’s 
differential notation and “multiply through” by dx, then we 
obtain 


d(fu) =udf + fdu, 


Or 
udf =d(fu) — fdu. 


Then, integrating from x = a to x = b, we arrive at 


b 
[ uco dt = {reo ue B- fP foodu, 


a 


We'll use this result at a crucial point in chapter 6 when we 
derive the Euler-Lagrange differential equation, which is at the 
core of the calculus of variations. 


What does it mean if, when h’(x) = dh/dx = 0, we have h" (x) = 
d*h/dx* = 0 as well? The second derivative test, which asks if h(x) 
is either greater than or Jess than zero, would seem to be equivocating 
when h”(x) is equal to zero. And indeed it is. In this case h(x) 
may or may not have an extrema. It is easy to demonstrate both 
possibilities. Suppose h(x) = x°. Then h'(x) = 3x? and h’(x) = 6x. 
Both derivatives vanish at x = 0, where there is not an extrema, as 
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* 0 x 
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FIGURE 5.2. (a) Zero 2nd derivative, no extrema. (b) Zero 2nd derivative, 
with extrema. 


shown in the first half of figure 5.2. However, if h(x) = x*, then 
h'(x) = 4x? and h’(x) = 12x?, and again both derivatives vanish at 
x = O where there is an extrema (a minimum), as shown in the 
second half of figure 5.2. We can distinguish the “extrema” and 
“no extrema” cases by observing that, for an extrema, h"(x) does 
not change sign around the extrema (indeed, 12x? never changes its 
sign), while when there is no extrema, h”(x) does change its sign 
around the value of x that gives h”(x) = 0 (6x does change sign 
around x = Q). In this last case, we say h(x) has an inflection point. 


Here’s a pretty little differentiation problem of historical in- 
terest, using all of the above ideas, for you to try your hand at. 
Posed by the nineteenth-century Swiss mathematician Jacob 
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Steiner (mentioned in chapter 2 in connection with the isoperi- 
metric problem), it asks for the value of x for which the xth root 
of x is a maximum. That is, if we define f(x) as 


f)=Veaxt, x>0, 


then for what x is f(x) the largest (and what is that maximum 
value)? Before starting your analysis you should convince your- 
self (with noncalculus reasoning!) that, as x increases from zero, 
f(x) first increases and then decreases, which suggests f(x) 
does indeed have a maximum. The answer is at the end of this 
chapter. 


5.2 Paintings Again, and Kepler’s Wine Barrel 


With the derivative, the original Regiomontanus problem from sec- 
tion 3.1, of determining the “best” distance to stand away from a 
painting hanging on a wall, becomes routine. In the notation of 
figure 3.2, the problem was to determine the x that maximizes 6 in 
the expression 


(b —a)x 


ant) = 74 (b-Wa—h) 


Since tan(@) increases with increasing 6, then simply maximizing the 
right-hand side will also maximize 0. In chapter 3 we used a tricky, 
noncalculus approach. But now we can write 


d _d ff@)_ 
re ae re a 


with f(x) = (b—a)x and g(x) = x* + (b—h)(a—h), and solve. From 
the differentiation formula in the last section for a quotient, we see 
that this is equivalent to solving 


df... dg 
g(x) ca f(x) ax’ 


i.e., to solving 


148 CHAPTER 5 


[x? +(b—h)(a — h)| (b —a) = (b—a)x(2x). 


This quickly results in x = /(b—h)(a—)h), just as we found in 
chapter 3. 

Another historical problem that yields easily to the derivative is 
Kepler’s wine barrel problem (mentioned in the previous chapter), 
on how to make the right cylindrical wine barrel of maximum vol- 
ume and prescribed diagonal (£). With the aid of the derivative, this 
is now a standard problem (in various disguises) in freshman cal- 
culus texts with, sadly, the history almost always unmentioned. In 
the notation of figure 5.3, where r, h, and V are the barrel’s radius, 
height, and volume, respectively, we have 


C= (2ryP +h? =4r +h? 


V =ar-h. 
So, 
e2 _ h2 
r= 
4 
and thus 
(? — h? 
Ven h=~ (h—h). 
4 4 


With V now expressed in terms of the single variable h, we can 
find the extrema of V by writing 


which says h? = + ¢”. Thus, 
2 2,1. 
Saige PF a) Bis ee Zea 
3 
or, with d = 2r as the barrel’s diameter, we have 


1 
Cad +. e, 
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= 


FIGURE 5.3. Kepler’s wine barrel. 


or, d* = = £°. That is, 


ay: 
da 3 d 
aa oe Le, = V2, 
3 


as stated back in chapter 4. The actual volume of the largest barrel is 


Vi =e dV," Ph 
mx = Tr h=nx\~) h= 7 dh= 


It 
— —— £° = 0.3023 2’. 
6/3 


5.3 The Mailable Package Paradox 


An interesting maximization problem of more recent vintage gives 
tise to the mailable package paradox. When you send a package by 
UPS (United Parcel Service), there are certain physical constraints 
you have to satisfy. These have changed over the years, but as I write, 
the maximum allowable length is 108", and the maximum size— 
defined by UPS as the length plus the package’s maximum girth—is 
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130". The girth at any point along the length is the distance around 
the cross section at that point (taken perpendicular to the length). 
Since it is the maximum girth that is used to determine the size, then 
it is clear that to maximize the package’s volume we should have 
all cross sections with the same girth. Since for a given girth (cross 
section perimeter) a circular cross section has the largest area, it then 
follows that a right circular cylinder is the shape of the maximum 
volume package (and not a sphere, as you'll soon see). 

This seemingly peculiar definition of size (length plus maximum 
girth) is used instead of the more obvious one of volume because it 
is easier and faster for a mail agent to determine. All that is needed is 
a flexible measuring tape, and no complicated volume calculations 
are required (just addition). But, there is a price paid for the conve- 
nience of this definition: it does occasionally lead to a paradoxical 
result. That is, it is possible to make two packages that, when pre- 
sented to a mail agent, are such that the agent will accept the larger 
volume but will reject the smaller volume! With calculus, this odd 
situation is easy to understand. 

Consider a cylindrical package with maximum volume with ra- 
dius r, length x, and volume v. If S denotes the maximum size al- 
lowed, then 


v=ar’x and S=x+2nr. 


S—x 


andso v=7 
1A 


7) 
ae ee 
) oe (xS 2SX +x”). 


We have an extrema for the volume when dv/dx = 0, i.e., when 
S* —4Sx + 3x? =0. 


This is easily solved to give either x = S$ or x = ;S. We reject the 
first solution because then r = 0, which certainly gives the minimum 
volume of zero! We therefore have x = 75 for the maximum volume 
(I'll leave it for you to verify that d?v/dx* < 0 at x = 3S, which 
means we have a maximum), which for UPS is acceptable, since 
then x = 4-130" < 108". Thus, the cylindrical package of maximum 
volume has the volume 
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S—-S S 3 


5 
-— = — = 0,0117893 S°. 
Qn 3 27x 


This is considerably larger than the spherical package of maxi- 
mum volume, because the circular cross sections of a sphere do not 
all have the same (maximum) girth. We can see that this is so be- 
cause, if the sphere has length (diameter) x then its radius is 4x and 
so its maximum girth is 2x (4x) = 2x. Thus, the UPS size is 


xt+tamx=x(x +1) 


Afi oe 
VS a ek) HS: 
3 2 6 
We clearly maximize v by simply maximizing x, which is achieved 


by dividing the maximum size S by z + 1. So, the volume of the 
largest mailable spherical package is 


and the volume is 


(=) 
watt! _ 90073705 $3. 


which is less than 63% the volume of the cylindrical package of 
maximum mailable volume. 

Now, what if the cross sections of a package are all the same but 
are not necessarily circular? This results in a somewhat surprising 
conclusion. Let each identical cross section have area A and perime- 
ter P. If we vary P (keeping the cross section shape fixed), then it 
is dimensionally clear that A = k P”, where k is some positive con- 
stant (“depending” on the shape). Thus, if x is the package length, 
we have the package volume and size as 


v=kP*x and S=x4P. 
So, 


x=S—P and v=kP?(S— P)=kSP*—kP?. 
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To find that P that maximizes v, we write 


d 
oF =0=2kSP —3kP?. 
dP 


or P= 2S. (Again, you should confirm that d?v/dP* < Oat P = <S, 
which means we have a maximum.) Thus, x = S — P = 4S, just as 
before for a constant, circular cross section. That is, independent of 
the shape of the package’s cross section, as long as it is the same 
everywhere, the package with maximum volume has length +S, 
one-third of the specified maximum size. 

Finally, the paradox. Imagine a cubical package with edge length 
55/24. Its size exceeds the maximum allowable because 


5S 5S 25S 

—_— ——_—- i CUD 
24 24 24 
oe en Val 


length girth 
and so this package is not mailable. But, it has a volume of 


55\° 
——} =—0,0090422 S?, 
24 


which is significantly less than the volume of the maximum volume 
cylindrical package. So, a UPS mail agent would accept the larger 
volume cylindrical package as small enough to mail, but would 
reject the smaller volume cubical package because it is too large! 
Ah, the complications of modern life. 


5.4 Projectile Motion in a Gravitational Field 


A classic use of the derivative is in the study of projectile motion 
through the Earth’s gravitational field. In this section I'll first show 
a simple application of the derivative to a number of athletic events 
and then, in the next section, a related military example dating 
from 1686. 

To start, imagine an athlete is a specialist in not only the shot put 
and the discus throw, but also in heaving the javelin and in golf! 
As different as these events are in their details, all can be expressed 
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mathematically (at the most elementary level of sophistication) by 
a common model: the release of a projectile at height h above the 
ground, with initial speed vp at release, and at a release angle of 0. 
Eventually, the projectile returns to the Earth at some distance R 
from the point directly beneath the release point. The values of h 
and up are assumed to be given for a given athlete; our problem here 
is to find the angle 0 that maximizes R. 

In the geometry of figure 5.4 (where the origin is the release 
point) we see that y = —h when the projectile hits the ground at 
distance x = R. Using g to denote the acceleration of gravity, and 
realizing that only the vertical component of the projectile’s speed 
is affected by gravity (I am ignoring any air-drag effect), we can write 
the horizontal and vertical components of the projectile’s speed, at 
time ¢, as 


= vg cos(@) 


= vo sin(@) — gt, t > 0. 


dx 
dt 
dy 
dt 


FIGuRE 5.4. Projectile motion in Earth’s gravitational field. 
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(Notice, carefully, that these two derivatives are present because of 
the physics of the problem, and not because of any extrema calcu- 
lation.) Since x(0) = y(0) = 0, these two differential equations are 
easily integrated to give 


x(t) = vot cos(@) 
y(t) = vot sin(@) — : gt. 


If we solve the first equation for f, i.e., if we write 


Xx 


~~ vp cos(9)’ 
and then substitute into the second equation, we get 


ef 


= 1an0) =. = 
@ ” 2 ve cos?(9) 


That is, y isa quadratic function of x, and so we have the well-known 
result that, for given values of vp and 0, the path of the projectile is 
a parabola. 

Since x = R when y = —A, then when the projectile hits the 
ground at time ¢ = 7, we have 


Uot cos(@) = R 


Te 
vot sin(@) 5 8! = —h. 


So, from the first equation, 


R 
Vp Cos(9) 


t= 


and thus, from the second equation, 


Rsin(@) 1 R? 


cos(0) 2° Ug cos? (8) — 


1 
Rue cos(@) sin(@) — 5 gR? + hv6 cos” (6) = 0. 
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Using the trigonometric identity sin(26) = 2sin(@) cos(6), this last 
expression becomes 


I | 
; Rug sin(26) — A gR* + hue cos’(6) = 0, 


or, at last, a result so important I’ll put it in a box: 


Rue sin(20) — gR? + 2hvé cos*(6) = 0. 


As our athlete’s goal is to pick that 6 (call it 6) that maximizes R, 
it now seems that we should introduce a derivative for mathematical 
reasons. Specifically, let’s differentiate term-by-term with respect to 
0 using the result from section 5.1 for how to differentiate products: 


dR dR 
Rvg2.cos(26) + v6 sin(20) = — 2gRT- — 2hvp2cos(6) sin(@) = 0. 


At an extrema (a maximum of R), we will have R’(@) =0, which gives 
2 Rv, cos(26) — 4hv6 cos(@) sin(@) = 0. 


And finally, using the above double-angle identity once more, this 
reduces to 


2 Rug cos(20) — 2hv¢ sin(20) = 0, 
OT 
R = htan(26). 


This isn’t, however, quite what we are after, which is the particular 
@ that maximizes R for a given vo and h. But this result isn’t useless, 
either; once we do have the value of that optimum 0 = 6, we 
can then find the actual distance achieved from Rmax = h tan(26). 
But, first, what is 6? We can get our hands on 6 by substituting 
Rae = ih tan(20) into our earlier boxed result that is true for any 


value of 6: Rug sin(20) — gR? + 2hv? cos?(9) = 0. Thus, 


hve tan(20) sin(20) — gh? tan? (26) + 2h ve cos? (6) =U, 
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or 
ve tan(26) sin(20) + 2up cos? (6) = gh tan?(26), 
or 
) sin? (26) a4 sin? (26) 
U — + 2cos*(8) | = gh——_~. 
cos(26) cos?(26) 


Since another trigonometric identity tells us that 
A I i 
cos“ (9) = sl! + cos(26) |, 
this last result becomes 


1 — cos? (26 ) 
cos? (26) 


) sin? (26 ) 
U 3 
cos(20) 


+1+ cn | = gh 


OT 


) sin?(26) + cos (20 y+ cos? (26) gi + cos(26) 
Vv = US = De SS? Sl a? UCC tC<“<s=C~S:C‘ = = RS 
: cos(26) : cos(2@) 


‘ [1 + cos(26)|[1 — cos(26)] 
cos? (26) 
So, after the obvious cancellation, 
A a - 
vp cos(20) = gh{1 — cos(29)], 
or, solving this easy equation for cos(26), 
gh kt Uo 
— = : a 
vp + gh v6 gta h 
8+ h 


cos(20 — 


The parameter a@ is characteristic of each particular athlete, depend- 
ing on both height h and strength (the speed vo of the projectile at 
the instant of release). So now, at last, we have the optimum value 
of 6: 
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de ih 
= F005" | P 
fi gt+a 
to give 


Rmax = h tan (26) 


An exceptional case occurs for golf. There, h is not a height 
associated with the player, as in the track-and-field events of 
the shot put, the javelin throw, and the discus toss. Rather, 
h is the height of the ball tee, which I’ll take as essentially 
zero. Thus, independent of vg we have a = oo and so 6 = 
+ cos”! (0) = 45°, i.e., all golfers, independent of their individ- 
ual strengths, have the same optimal angle when swinging for 
distance. The actual distance achieved does of course, depend 
greatly on vo. 

Interestingly, for golf, our result for Rmax, 


Rac a tan(26), 
is indeterminate (useless) because it reduces to 
Rie = 006 = 777 


To determine Rmax for the golf case of h = 0, let’s return to our 
earlier boxed result just before we differentiated with respect to 
0, which is true for any 0: 
Rvo sin(20) — gR? + 2hv6 cos’(6) = 0. 
For@ =6 = 45°, we have R = Rmax, and as h = O, then 
Riis 8 R? = 0, 


max 


OI 


Rmax = 
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A strong golfer can drive a ball off its tee at about vp = 160 
feet/second, and so this analysis predicts the maximum driving 
distance to be 


_ (160 ft/sec)? 


x= = 795 ft. 
‘a 32.2 ft/sec? 


This is, indeed, a long drive, but one that occasionally is actu- 
ally observed. 


Now, one last point. A physicist or engineer would argue that it is 
physically obvious that our result for 6 gives a maximum in R, nota 
minimum. For 6 > 6, the projectile spends most of its time traveling 
vertically, not horizontally. For 6 < 6, gravity pulls the projectile 
back to Earth “too soon.” A mathematician, however, would want to 
apply the second derivative test, and this is a good exercise for you to 
run through. Simply start with the result of the first differentiation 
(before we set R’(@) = 0) and differentiate it again. Then set R’(@) = 0, 
as well as use our two results for that optimal case R = h tan(260) and 
cos(26) = gh/ (v6 + gh)). That will result (if you are careful with the 
algebra) in R”(@) < 0, which means the extrema in R is, indeed, a 
maximum. 


5.5 The Perfect Basketball Shot 


An interesting (and historically important, as you'll soon see) twist 
to the analysis of the previous section can also be found in a non- 
track-and-field athletic event: basketball. Assume a player is standing 
directly in front of a basketball hoop, preparing to make a free-throw 
shot. If we construct a coordinate system with its origin at the release 
point (i.e., where the ball leaves the player’s hands), then we have 
the geometry shown in figure 5.5. The ball is released at time t = 0 
from the origin, at a launch angle 6, with initial speed vp, with the 
goal of having the ball drop through the hoop, i.e., of having the 
ball pass through the hoop’s location at (x = @, y = h) on the falling 
portion of its parabolic trajectory. In this section, I’ll show you how 
calculus lets us determine the minimum value of vg required to do 
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. 


FIGURE 5.5. Geometry of basketball shooting. 
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this, and then I’ll explain how (and why) this problem was stated 
and solved more than three centuries ago, long before the invention 
of basketball. (The explanation does not involve time travel!) Much 
of what follows was inspired by an essay written by C. W. Groetsch, 
“Halley’s Gunnery Rule” (The College Mathematics Journal, January 


1997, pp. 47—S0). 


If we say that the ball passes through the hoop at time ¢ = fg, then 
from the previous section we know that x(t) = @ and y(t) = h, 


where 


l 
x(t) = vot cos(@), y(t) = vot sin(@) — 5 gt’. 


That is, we require 


| 
£ = voto cos(6), h = voto sin(@) — 5 gt, 


and so 


= £ 
vg cos(6)’ 
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and therefore, 


sin(@) 1 e? 


cos(9) 2 oy cos?(8) 


That is, 


1 gt? 
h = ftan(6) — 5 7 
Vv 


0 


sec’ (0). 


Solving for Us; we atrive at the somewhat complicated looking result, 


l 
5 gl* sec?(0) 
€ tan(@) —h’ 


2 
Uo = 


which tells us with what initial speed the player must send the ball 
on its way, given the hoop location (the values of h and @) and the 
launch angle (6). 

We can now derive an interesting mathematical constraint on vA 
that shows there is a minimum initial speed if the ball is to pass 
through the point (¢, h). This makes sense physically, of course. Af- 
ter all, if the loop is (for example) 25 feet (horizontally) away from 
the player, and the hoop is 10 feet above the court, then even a 
nonmathematician, couch-potato, Larry Bird wanna-be knows in- 
tuitively that the puny launch speed of vp = 1 foot/second isn’t 
going to score any points! With some simple algebra we can find 
an expression for the minimum launch speed, in terms of @ and h. 
(This is equivalent to asking for the minimum energy shot.) 

Since sec?(9) = 1 + tan?(@), then 


“ gé’[1 + tan? (6)] 


yo 2 
° £tan(@) —h 


9 


or 
1 2, 1p) 2 2 2 
58 + 58 tan“(0) = vp tan(@) — uph, 


OT 
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1 1 
58e tan?(0) — v6e tan(6) + 58 + veh = 0, 


or, finally, 


2 


2u2 2uch 
tan?(6) — 220 tan(@) + 1+ ial ae 0, 
ge gl? 


a quadratic in tan(@). So, using the quadratic formula to solve for 
tan(0), we have a result so important I’ll put it in a box: 


For this to make physical sense we demand that tan(@) be real, i.e., 
that the square root be of a nonnegative quantity. So, 


which is easily manipulated into 
Ug — 2ghug — g°l* > 0. 


The left-hand side of this inequality is a quadratic in v4, to which 
we can again apply the quadratic formula to conclude that 


vo = ghtagVvh? + &. 
2 


But we can immediately reject the negative root because vg must, of 
course, be positive. Thus, we write 


vpza(ntver+e). 


From our earlier numerical example of £ = 25 feet and the hoop 10 
feet above the court, then h = 4 feet for a player who releases the 
ball at a height 6 feet above the court, and we have 
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2 2 

vp = 32.2(44+ V16+ 625) = 944 
sec sec 
For the ball to pass through the hoop at (25,4) we must have vp > 
30.7 feet/second, i.e., the minimum speed is v9 = 30.7 feet/second. 
This result does not, however, completely define what the player 
has to do to score with minimum expended energy; he must also 
determine the launch angle. 

To find the angle of the minimum energy shot, return to the 
boxed tan(@) expression and use the fact that at minimum launch 
speed the quartic inequality for vp becomes an equality (vj —2ghv2 — 
g*l* = 0), and so 


U6 
tan(é) = —, 
gl 


6 = tan! % — tan! cca = 49.54° 
gl (32.2)(25) 2 


But we still are not quite done. We have, so far, not specifically 
imposed the requirement that the ball drop through the hoop (the 
other way for the ball to pass through the hoop is on the upward 
portion of its trajectory, which is clearly not a legal basketball play!) 
We need to explore this issue next. 

The mathematical requirement we need, at time ¢ = fo, is that 


OT 


dy 


< 0, 
dt t=Io 


which is simply the requirement that the ball’s vertical speed be 
negative as the ball passes through the hoop, i.e., downward-directed 
toward the ground. That insures that the ball is falling through the 
hoop. Thus, as 


dy 
—_— = in(@) — gt 
7 vo sin(@) — g 


in general, then at time t = fp, we can write 


vo sin(@) — gio < 0. 
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This says 
£ 
on gto 89 cos) _ gl 
eo sin(@)—sin(@)~—_—svug sin(@) cos(@) 
That is, 


Pg OP 
: sin(@) cos(@) 


Combining this with our previous result for Us; we have 
gl? sec?(0) 
a a 
€tan(@) —h sin(O) cos(@) | 


Dividing through by gé and then cross-multiplying, this becomes 
1 
£ sin(9) cos(@) sec?(6) < £tan(6) —h, 

or, since sin(@) cos(@) sec*(@) = tan(@), we have 


l 
5 £tan(@) < £tan(@) —h. 


Thus, 
l 
h< 5 £tan(@) 


and so, for the ball to drop through the hoop, we have the following 
inequality that must be satisfied by the launch angle: 


2h 
6 > tan! (=) . 
£L 


The question now is: what angle 6 goes with the minimum ve- 
locity, i.e., does the 6 associated with the minimum energy shot 
satisfy the above inequality? If it does, then the ball does indeed 
drop through the hoop. Otherwise, the ball must rise through the 
hoop and that would, in the context of our original problem, be 
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an illegal shot. So, as I just did for the specific numerical example, 
let’s return to the boxed tan(6@) equation but now insert the general 
expression for the minimum Uae As in the numerical example, the 
Square root in the boxed tan(@) equation is zero and so the required 
launch angle is 

g(htvh? +0?) h+VJh?4+ € 


tan(@) = Op ag Oe ee 
gl gl L 


. OY iy 2(” 
=— — >2{—}]. 
£ £ £ 


Thus, the answer to our question is yes, if the minimum launch 
speed is used, then the condition on 6, for the ball to drop through 
the hoop, is satisfied. 

The minimum speed (minimum energy) launch angle has a very 
interesting geometric interpretation. In figure 5.6, I have constructed 
a right triangle with a base angle of 6, by giving it a base length of 


= 


FIGURE 5.6. Geometry of the minimum energy launch angle. 
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£ and then two consecutive components to the vertical side; one of 
length h and the other of length Vh? + £2. This right triangle has 
then been divided into two other triangles, which I’ll call the up- 
per and lower triangles. Since the hypotenuse of the lower triangle 
(which is also a side of the upper triangle) has length Vh? + é2, then 
the upper triangle is isosceles, which is why I have given the same 
angle 6 to the two angles shown in figure 5.6. The last angle labeled 
is a, the top angle of the lower triangle. 

From elementary geometry we can now write the following se- 
quence of statements: 


(a) 6+B=90°, ord =90° - 8; 
(b) aw+tan—!(h/2) =90°, ora = 90° —tan7!(h/2); 
(c) 26 +(180°—a) = 180°, or B = 5a =45°— 5 tan! (h/2). 


Substituting the expression for 6 into the expression for @ in (a), we 


have 
I h I h 
6 = 90° — | 45° — — tan7' (—} | = 45° + — tan™! (- 
2 l 2 e 


h 
90° + tan7! (7) 


2 


That is, the minimum-launch-energy shot has a launch angle given 
by the average of the line-of-sight angle from the player to the hoop, 
and the vertical. For example, returning to our numerical example 
of £ = 25 feet and h = 4 feet, the line-of-sight angle to the hoop 
is tan~'(4/25) = 9.09°, and so the launch angle for the minimum 
speed (minimum energy) shot is 


90° + 9.09° 
@= a = 49.54°, 


just as we calculated earlier in the numerical example. 


5.6 Halley’s Gunnery Problem 


The basketball problem was originally solved in 1686, and it ap- 
peared in print in a paper published in 1688 by the Royal Society in 
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its Philosophical Transactions. The author (who as Editor was easily 
able to arrange to have the publication back-dated to 1686), was 
Edmond Halley (1656-1742), and he was obviously motivated by 
something other than basketball, of course, as that activity didn’t 
arrive on the scene until considerably later. Today we remember Hal- 
ley mostly for two reasons; the comet named after him because he 
was the first to recognize it as a periodically returning body traveling 
on a greatly elongated elliptical orbit around the sun, and for being 
the force (both spiritually and financially) behind getting Newton’s 
masterpiece Principia Mathematica published in 1687. (Halley was 
also the “infidel”—because he had convinced a mutual acquain- 
tance that the Christian faith is a fairy tale—mentioned in the subti- 
tle of Bishop Berkeley’s attack on the logic of calculus, The Analyst.) 
But Halley was also an accomplished scientist and mathematician 
in his own right, and his solution to the “basketball problem” shows 
a first-class intellect at work. 

The last phrase of the rather long title to Halley’s paper gives us 
a clue to his motivation: “A discourse concerning gravity, and its 
properties wherein the descent of heavy bodies, and the motion of 
projectiles is briefly but fully handled: together with the solution of 
a problem of great use in gunnery.” What Halley did in this paper 
was to address the problem of determining the best way for a cannon 
to lob a projectile onto a target located above the gun, e.g., onto 
a town high up on a mountain side, with the gun located in the 
plains far below. As Halley wrote, in a second paper published in 
1695 (which contains a derivation of the minimum speed launch 
angle as the average of the line of sight and the vertical angles), 
it isn’t a good idea to simply blast away with all of the power the 
gun could possibly provide. That’s because such energetic projectiles 
arrive on target at such high speed that they “bury themselves too 
deep in the ground, to do all the damage that they might. . . which 
is a thing acknowledged by the besieged in all towns, who unpave 
their streets, to let the bombs bury themselves and thereby stifle the 
force of their splinters.” 

Halley therefore reasoned that the proper way to launch a bomb 
at the higher elevation target was to arrange for the bomb to drop 
onto the target with minimum kinetic energy. Now, even though a 
cannon-fired projectile is moving through the air at speeds much 
faster than a shot, a discus, a basketball, or even a golf ball, Halley 


CALCULUS, CENTER STAGE 167 


did as I have done in the basketball analysis, and ignored all air-drag 
effects. That is, we will continue to assume energy is conserved, and 
so the sum of the kinetic and potential energies of the projectile will, 
at every instant of time, be a constant. 

At launch, the projectile has only kinetic energy of motion, and 
zero potential energy. At impact, it has the potential energy due to 
the height of the target, plus the kinetic energy of its motion at 
impact (which should be as small as possible). There is, of course, 
nothing we can do about the potential energy at the target height, 
and so to minimize the impact kinetic energy, one must minimize 
the launch (kinetic) energy, i.e., minimize the launch speed. And so 
Halley arrived at the basketball problem, long before the invention 
of basketball. An immediate implication of this conclusion is that 
the powder charge needed to send the projectile on its way is also 
minimized. This was, no doubt, attractive to those responsible for 
how the king’s coin was spent. The immediate question this raises, of 
course, is just what is the powder charge required to deliver a projec- 
tile, with minimum energy, to an elevated target? Halley answered 
this question, too. 

As derived in the golf ball example of section 5.4, a ball driven 
off of its tee at an initial speed of ug, at an angle of 45°, achieves its 
maximum horizontal range of vp/g over a horizontal surface. What 
is true for the golf ball is true for the cannon projectile (ignoring air 
drag), if the cannon is fired over a horizontal surface with its barrel 
elevated to 45°. Since us = g(h+ Wh? + €2) for the minimum energy 
shot, then the value of u3/g ish+J/h2 + £2, and this gives us Halley’s 
so-called calibration rule: the powder charge required to deliver a 
projectile onto a target at (€,h) with minimum Kinetic energy is 
the same charge required to shoot the same projectile out of the 
cannon (with 45° of barrel elevation) to a distance of h + Vh?2 + €2. 
A series of test firings for any given cannon and projectile could give 
a table of powder charge versus projectile range. It is clear, of course, 
that it is possible to have two targets, with very different values of 
£ and h, requiring the same powder charge. For example, a target 
at (2000, 1000) has the same required powder charge as a target at 
(2690, 500). All that remained for the gunner to do, then, was to 
use Halley’s angle rule to get the proper elevation of the cannon 
barrel. For our two targets, for example, the elevation angles are, 
respectively, 


168 CHAPTER 5 


and 


500 
90° + tan! ( ) 
8" _ 50.3°. 


The proper barrel elevation angle has a special minimization 
property that Halley also discovered, in response to a very practi- 
cal concern. Suppose the gunner makes a slight error in setting the 
elevation angle. How would that affect the accuracy of the bom- 
bardment? That is, if he makes a slight change (error) of A@ from 
the correct 6, then how much of a change is made in the impact 
point? Note carefully that we are making an error only in 6; the 
powder charge, and hence vg, is taken as correct. 

We start by recalling a result from the previous section, 


2u6 2ugh 


tan2(9) — —° tan(@) +1 + — 0. 
gl gl? 


Remember what these symbols mean: h is the height of the target, 
and thus is a constant, but @ (the range of the projectile when it 
is at height h) depends on @. If @ is the range to the target, then, by 
definition, 6 is set correctly because then the projectile and target co- 
incide! To simplify the algebra, let’s make the following definitions: 


u=tan(@), a variable; 
h 
a= 2 a variable; 


2u6 
= -—, aconstant. 


Then, 


2 P a 
— — ] —-=0Q, 
u pee TPS 


which is easily solved for 2: 
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_ p(u—a) 
uz+1— 


To find how £ varies with small changes in 6, we can use the chain 
rule. From the very definition of the derivative, if A@ is a “small” 
change in 6, then the change in @ is A£, where 

dt 


Al AO —. 
dé 


By the chain rule, 


1 dt 


dl d€ du dé a rtan(6)] 
= — — . — [tan = —_—— - —, 
dé cos*(6) du 


dd du d@~ du 


Now, 


dé _d p(u—a)| _ d |u-—a 
du du| u2+1 ~ Pot u*+ 1 
d 
w+ (1-2) wad 
7 du 
= (u? +1)? 


d 
u? +1 —2u? + 2au — (u? + = 
= u 
=e (+1)? 


d 
ipa) 
du 


aia (u? +1)? 


Remembering that h is a constant, we have 


dh de 
da 4 (i) _ “dud h dé a de 
a en 


du du 2 du & du’ 


£ 


and so 


5 5 a dt 
de 1+ 2au—ue+(u-+1)-—--— 


as L du 
du (u2 + 1)? 
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This can be solved for d£/du to give 


fe | eee ere +202), 


€(u? + 1) — ap| (u? +1) 


The two factors on the right-hand side are such that the second one 
is zero and the first one is finite. To see this, consider the following. 
When the gunner’s aim is perfect, i.e., when he has set the angle 


6 so that 
h h\? 
tan(@) =u= 7 + a +l=a+vVJa?+l, 


(as Shown in the previous section), then we have 


1+ 2au —u? =142ala + Va? +1] —[a+ Va? 41) 
= 1+42a*+2aVa* + 1-—[a*+2aVa?2+1+a’°+1] 
=0. 


Thus, the second factor of d@/du vanishes. 
For the first factor of d@/du, notice that since u* — (pu/@) + 1+ 
p(a/l) = 0, then £u* + £ = pu — ap. Thus, 


1+ £(u7 +1) — ap = du? + € — ap = pu — ap — ap = pu —2ap, 


and so the first factor of d£/du is proportional to 


pe £ £ £ 
se SS Ss 5S), 
pu — 2ap u—2a a+<Ja*+1-—-—2a Jaz+l—a 


i.e., the first factor is positive and, more importantly, finite. Thus, 
when the gunner’s aim is perfect we see that 


de 

—_ — 0, 

du 
which says 


dé de 
d6 _cos2(@) du 


CALCULUS, CENTER STAGE 171 


Thus, when the angle is set correctly, we have d€/d@ = 0 and this 
says that, even when the aim is set not so perfect but is still “in the 
neighborhood” around perfect aim, we have 


At aot ~ (0). 
dé 


Halley summarized his minimum results, including this last one, as 
follows: “This Rule may be of good use to all Bombardiers and Gun- 
ners, not only that they may use no more Powder than is necessary, 
to cast their Bombs into the place assigned, but that they may shoot 
with much more certainty, for that a small Error committed in the 
Elevation of the Piece, will produce no sensible difference in the fall 
of the Shot.” Thus wrote Edmond Halley over three centuries ago, 
one of the world’s first modern theoreticians in the arcane art of 
military weapons analysis. 


5.7 De L’Hospital and His Pulley Problem, and a New 
Minimum Principle 


It is a curious fact that while Newton’s Principia is the origin of 
modern physics, a subject universally presented to modern students 
using Newton’s co-invention of the calculus, Principia itself does not 
use calculus. Rather, Newton presented the new physics with the 
mathematical aid of the “old math,” Euclidean geometry, which 
makes for a presentation vastly more difficult than does the modern 
approach. Why did Newton do that, even though he obviously 
possessed the math we use today (he invented it!)? Almost surely 
the answer is that Newton wanted to avoid distracting his readers 
from the physics, which the then still mysterious calculus would 
have done. In Principia, Newton’s goal was to champion his physics, 
not his math. 

The recognition for publishing the world’s first calculus book, 
then, goes not to Newton but to another. The French mathematician 
Guillaume-Francois-Antoine de L’ Hospital (1661-1704), who was an 
army cavalry officer until bad eyesight caused him to resign, has 
that honor. He was a quick study who could readily absorb the 
discoveries of others and then present them in a coherent manner 
for a wide audience. The contemplation of mathematics, then, was 
the perfect activity for a bright but nearsighted gentleman. 
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In de L’Hospital’s time, when elementary calculus techniques 
were first being discovered, there weren’t a lot of pedagogical re- 
sources around. So, what he did was hire the brightest of Leibniz’s 
own students, Johann Bernoulli (1667-1748), then still a young 
man, to teach him the new math. (We’ll hear again from Bernoulli, 
in the next chapter, in connection with one of the most famous min- 
imization problems in mathematics.) De L’Hospital paid Bernoulli 
well and came to believe that, since he had paid for the new results, 
then those accomplishments were his. Such a “purchase” of intel- 
lectual property rights would, today, be considered acceptable only 
in matters like a celebrity hiring a ghostwriter to pen a so-called 
autobiography, an activity that is itself on the borderline of dubious 
authorship. 

By 1696, de L’Hospital felt he had sufficient material on hand 
from Bernoulli to publish a book, Analyse des Infiniment Petits (anal- 
ysis of the infinitely small). While containing nothing of his own 
discoveries, the book was well written and quickly became famous. 
It did contain, however, many of Bernoulli’s discoveries, as well as 
those of Newton, Leibniz, and Bernoulli’s older brother Jacob. Some 
writers have commented that de L’Hospital gave no mention at 
all to the Bernoulli brothers, and others have said that he did. In 
fact, he did—but not very much! Two brief sentences appear in the 
preface—“I am obliged to the gentlemen Bernoulli for their many 
bright ideas; particularly to the younger Mr. Bernoulli who is now a 
professor. I have made free use of their discoveries... .” In fact, de 
L’Hospital’s words are a vast understatement. After de L’Hospital’s 
death, Bernoulli began to claim credit for nearly all of the book, 
a Claim initially rejected by mathematicians and historians alike. 
However, in 1922, a copy of a set of notes taken during a series 
of lectures Bernoulli gave on the differential calculus in Geneva, in 
1691, was discovered. The organization and content of those notes, 
written five years before de L’Hospital’s book, are virtually identical 
with the book. 


The classic example of de L’Hospital’s appropriation of Ber- 
noulli’s work is the famous rule for calculating indeterminate 
limits. It is often the case that an analyst needs to compute 
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the value of a ratio of two functions of the same independent 
variable (call it x) as x approaches zero. That is, she needs to 
compute the limit 


_ _ g(x) 
| lim R(x) = ey 


Often, this is an easy calculation. For example, it is clear that 


3x48 8 
R = lim R(x) = lim Seo oe 
00a A 


But what do we do with something like 
Pian 
x0) x-0 XX 


which reduces to the indeterminate 0/0 if we simply stick x = 0 
into the numerator and the denominator of the ratio? While 
it was Bernoulli who showed 


aye) (x) 
R= lim R 
ee ey: 
and so 
Pinkeye Si Seat 
x—O0 x0 x x—0 1 x0 


this formula is instead known today as L’Hospital’s rule, not as 
Bernoulli’s rule. It is easy to derive. 


Since g(x) = R(x)h(x), then differentiation of both sides 
gives 
g(x) = R(x)h' (x) + R'(x)A(X). 


Since lim,_,9 h(x) = 0, and if we assume R(x) really does have 
a limit as x — 0, i.e., lim,_,9 R(x) = R, then 


lim g(x) = _ R(x) h'(x) + a R’(x) h(x) 


= R lim h'(x) + lim R'(x) lim h(x), 
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The last term is zero because lim,_,9 h(x) = 0 and because the 
very fact that lim,_,9 R(x) = R implies that lim,_,9 R’(x) = 0, 
too (i.e., the y = R(x) curve must approach the horizontal, zero- 
slope line y = R as x —> QO). So, 


lim g(x) =R lim h’(x), 
x0 x0 


and we have L’Hospital’s rule. 


Oddly enough, de L’Hospital was actually quite a good mathe- 
matician in his own right, and so why he did what he did remains, 
I think, a bit of a puzzle. Bernoulli, for his part, had remained silent 
until 1704 because his agreement with de L’Hospital had been that, 
in exchange for the rather large payments made for giving his math 
lessons, Bernoulli would remain quiet. Bernoulli’s own complicity in 
this peculiar “contract” is also perplexing. Well, no matter the con- 
flicted ethical issues involved with de L’Hospital’s book, it does con- 
tain a number of interesting mathematical problems. One of them, 
in particular, demonstrates not only the differential calculus, but 
also the power of a new minimum principle similar in spirit to Fer- 
mat’s least-time principle (see Alexander J. Hahn, “Two Historical 
Applications of Calculus,” The College Mathematics Journal, March 
1998, pp. 93-103). 

The problem, easy to visualize, is illustrated in figure 5.7. At point 
A on a ceiling we attach one end of an idealized, weightless cable of 
length r. The other end of this completely flexible cable is attached 
to the center axle of a weightless pulley. At point B on the ceiling, 
distance d from A, we attach one end of another idealized cable of 
length £, and pass it over the pulley. The other end of this second 
cable is attached to a block of material with weight W. We then let 
this system of cables, pulley, and weight freely adjust itself to its 
final, stationary (unmoving) configuration under the influence of 
gravity, with the pulley’s ultimate location labeled as point C. 

If r < d, then it is physically clear that the final equilibrium 
position of the system will be as shown in the illustration, with C 
below and between A and B, and with the weight directly below 
C, at point D. This is the case of mathematical interest, as well, 
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FIGURE 5.7. Geometry of L’Hospital’s pulley problem. 


because if r > d, it is equally obvious that then the weight would 
simply hang straight below B. That is, the weightless pulley would 
slide along the cable attached to the weight until the weightless 
pulley cable is pulled straight (or, if r > /22 + d?, until the pulley 
rests on top of the weight). Therefore, if r > d, the weight hangs 
directly beneath B, distance ¢ below the ceiling, and that completely 
describes the equilibrium configuration of the system. 

Far more interesting, physically and mathematically, is the case 
r <d. What, then, is the equilibrium configuration of the system? 
That is, where does the pulley end up? To start the analysis of this 
question, let point C be distance x to the left of A (and so distance 
d—x to the right of B). Obviously, 0 < x < r. Using the Pythagorean 
theorem twice, it is easy to see, as illustrated in figure 5.7, that the 
distance the weight hangs below the ceiling is the function x (let’s 
call it f(x)) given by 


fo) = VP + e- Vat Pe. 
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It may not be obvious, however, just what we should do with f(x) 
to help us find where C is. In fact, to continue with the calculus so- 
lution I now need to introduce that new minimum principle I men- 
tioned earlier. Before I do that, however, let me solve the problem 
in an entirely different way, not using calculus, and then when we 
return to f(x) and apply calculus to it, we will be able to check the 
answer (be assured, the answers will agree!) The calculus approach 
will prove to be the easier to perform. 

De L’Hospital’s pulley problem is actually a type of problem com- 
monly encountered by students in a first-year course in physics and 
engineering, when studying statics (the physics of unmoving sys- 
tems). The key physical observation is that the cables, pulley, and 
weight are not moving in their final, stable configuration. In partic- 
ular, the pulley is not moving. Newton’s physics then tells us that 
this means there is no net force acting on the pulley; otherwise the 
pulley would be accelerated, i.e., it would move. So, the stable, or 
equilibrium, configuration can be found by setting the sum of all 
of the individual horizontal forces acting on the pulley to zero, and 
similarly for the sum of all the individual vertical forces acting on 
the pulley. Those forces come from the tensions in the two cables. 
The cable attached to the weight has tension W. This is clearly so 
for the vertical portion of that cable and, since the tension must be 
the same all along the cable (if not, there would be some place on 
the cable with a nonzero net force there and that part of the cable 
would move, contrary to the reality that the cable is not moving) 
the tension is everywhere W, even in the nonvertical portion of the 
cable. The horizontal component of this tension is directed to the 
left, with value W cos(f). If we call the tension in the other cable 
(the one attached to the pulley) 7, then that tension has horizontal 
component T cos(q@) directed to the right. Thus, 


W cos(B) — T cos(a) = 0, 


OT 


cos(B) 


cos(a) 


Now, the vertical sum of forces on the pulley is given by 


W sin(B) + T sin(a) — W = 0. 
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Substituting in for T, 


cos(B) sin(@) 7 


W sin(B) + W 
cos(a) 


W =0, 


or 
sin(B) + cos(B) tan(a) — 1 = 0. 
From the geometry of figure 5.7, we can write 
PS) 

J(d — x)? +r? — x2 
d-—x 
Jd—x)?+r2— x2 

[poy 


Xx 


sin(B) = 
cos(B) = 


tan(a) = 


Thus, 


rp? — x? é d—x r* — x? 
V(d—x)?+r*—x? V(d—x)* +r? — x? x 


which can be algebraically manipulated into 
2x*d —r*x —r*d =0. 


This quadratic in x is now easy to solve (and I'll do that in just a bit), 
and our question (where is point C, the location of the pulley?) is 
answered. 

Now, let’s return to that f(x) function derived earlier (which tells 
us how far below the ceiling the weight hangs). The new minimum 
principle I mentioned before is now applied—the system is in stable 
equilibrium when its potential energy is minimum. (We'll use this 
same argument again, in the next chapter, to solve a much more 
famous problem than this one.) That is, stable equilibrium occurs 
when the weight hangs as far below the ceiling as possible, which 
occurs when f(x) is maximum. So, all we need to do is set the 
derivative of f(x) equal to zero and solve for x, i.e., 
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df = —2x —2(d — x) — 2x 


dx Wr? 2A/d—xP+P—x 


Once again, this expression is easy to algebraically manipulate to 
give 


2x°*d —r*x —r*d = 0, 


precisely the quadratic result we got from the statics analysis. 
To finish the problem, all we need do now is to actually solve the 
quadratic. We, of course, formally get two answers: 


2 4 i) 
=e 8d 
ery is Le ae ORE FD 
4d 4d 
It is physically obvious that x is not negative, and so we reject the 
negative root. So, the location of the pulley is given by 


< r+ r? + 8d?]. 


; 
— rel 
Notice, too, that the constraint x < r is also satisfied, because we 
can write x as 


and, since r/d < 1, it follows that 
I 
x < grli+v1+8] =r. 


As a final comment on de L’Hospital’s pulley problem, notice that 
the solution value of x has no dependence on either the weight W, 
or on the length @ of cable attached to the weight. (For many people, 
including me, this is not intuitively obvious!) Only the length of the 
cable connected to the pulley (7), and distance between the ceiling 
connections (d), determine the location of the pulley. Of course, 
the actual value of f(x), the distance the weight hangs below the 
ceiling, does depend on £. 
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FIGURE 5.8. Rainbows are sunlight scattered by raindrops. 


5.8 Derivatives and the Rainbow 


For the final section of this chapter, giving yet another illustration 
of applying calculus to understand a physical problem, we return 
to Snell’s law. Imagine you are standing on a wide, level plain, 
with the sun at your back, as shown in figure 5.8. The sun is angle 
gy above the horizon. In front of you the sky is full of raindrops, 
either because of a storm or, perhaps, because you are watering the 
lawn with the garden hose set on spray. Some of the sun’s light 
rays will be scattered by the drops, i.e., through a combination of 
internal reflections and refractions by the drops, light rays will be 
bent backward and downward, into your eyes. This is the light you 
see as the rainbow (more, later, on the colors), one of the most 
beautiful of naturally occurring phenomena. As shown in figure 5.8, 
if we extend the line from the sun to the observer (this is called the 
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anti-solar line) into the ground, then the primary rainbow appears 
at an angle a of about 42° up from the anti-solar line. (You’ll soon 
see why that is so!) An immediate consequence of this, of course, 
is that there is a primary rainbow “there to see” only if the sun 
is sufficiently low in the sky so that the 42° “up angle” places the 
rainbow in the sky at all; this is clearly not the case if the sun is 
higher than 42° above the horizon. So, from the ground you can see 
rainbows in the morning, and in the afternoon, but never when the 
sun is directly overhead. 

The search for understanding the origin of the rainbow was a long 
one, with speculations about it appearing in the writings of Aristotle 
(he incorrectly thought reflections off of entire clouds, rather than 
individual raindrops, was the mechanism involved). Indeed, when 
the first human eyes looked up into a passing rain shower, thou- 
sands of years ago, and saw the rainbow, who can doubt that awe 
wasn’t inspired as much in primitive humans then as with Aristo- 
tle and us today? There often is also a secondary, much less bright 
rainbow (with the colors in reverse order) visible as well, at an angle 
of about 52° up from the anti-solar line. Why is that, and are there 
even more rainbows in the sky? People have wondered about such 
questions for centuries. The definitive history of the search for the 
answers is given in the book by the mathematician Carl Boyer, The 
Rainbow: From Myth to Mathematics (Princeton University Press 1987; 
first published in 1959). Two beautiful, nonmathematical books on 
the rainbow, each with many spectacular color images, are Robert 
Greenlear’s Rainbows, Halos, and Glories (Cambridge University Press 
1980; Professor Greenlear wrote the new introductory essay to the 
Princeton reprint of Boyer’s book), and Color and Light in Nature (sec- 
ond edition) by David K. Lynch and William Livingston (Cambridge 
University Press 2001). 

To start our analysis, we need to model in detail what happens to 
light rays arriving at a typical raindrop (assumed to be a sphere) in 
the sky. Figure 5.8 is pretty thin on detail! Figure 5.9 shows one such 
incident ray, and what happens to it. 


1. As the ray arrives at point A on the drop’s surface, a fraction 
of it is reflected off of the surface and the rest is refracted into 
the drop. The angles of incidence and refraction are, as in 
chapter 4, 6; and 0,, respectively. 
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2. The portion refracted into the drop travels onward until, 
upon arriving at point B on the back surface, a portion is 
refracted back out into space and the rest is internally reflected 
back into the drop. Since the sides OA and OB are equal in 
length (both are radii of the spherical drop), the triangle OAB 
is isosceles and the internal reflection angle is 6,. 

3. The internally reflected portion continues to “bounce 
around” inside the drop as it experiences a reflection/ 
refraction with each additional interaction with the drop/air 
interface. Figure 5.9 shows only the refracted portion of the 
light ray that emerges from the drop at point C after the 
single reflection at point B. We’ll come back later to the 
portions that go on to further adventures inside the drop, 
which will explain the secondary rainbow. 


normal to drop surface 


incident ray ‘ oe 


reflected ray 


U 


emergent ray ee 
normal to drop surface 


FiGurE 5.9. Detailed geometrical origin of the primary rainbow. 
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How much light is reflected and how much is refracted, at 
each interaction between the light ray and the water drop’s 
air/water interface, is a complicated business. The answer de- 
pends on such details as the actual angle of incidence and the 
polarization of the light (which describes the electromagnetic 
details of the light). Fortunately, we don’t have to go into the 
physics of light that far; all we care about here is that some 
light does emerge at the proper angle to arrive at our eyes. Light 
that goes elsewhere is light we don’t see, in any case. For those 
who are interested in understanding how such intensity calcu- 
lations are done, there is no better place to start than with the 
beautiful paper by Jearl D. Walker, “Multiple Rainbows from 
Single Drops of Water and Other Liquids” (American Journal of 
Physics, May 1976, pp. 421-33). 


Concentrating for now on the ray emerging at point C, after just a 
single internal reflection (and two refractions), we extend (as shown 
in figure 5.9) the lines of the incident and the emergent rays until 
they intersect at point /. This defines the angle D, which is the total 
angular deviation suffered by the light ray from when it entered the 
drop until it left the drop. As shown by the geometry of figures 5.8 
and 5.9, the angle (what we might call the deflection angle) between 
the incident and emerging rays is 


a= 2(26, _ 6; ) ’ 
and so 
D = 180° —a = 180° + 26; — 46,. 


Now, from Snell’s law we have, with n, and nj, the indices of 
refraction of air and water, respectively: 


sin(6; ) _ n2 


sin(@, ) ny 


OT 
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l 
6, = sin! |= sin(@)| 
n 
and 
: ee oe 
D = 180° + 26; — 4sin — sin(6;)} . 
n 


In particular, if the incident ray passes through the center of the 
drop (through point O), then 6; = 0° and thus the deflection angle 
isa = 0°, and the deviation angle is D = 180°, i.e., the ray is reflected 
back out of the drop along the same path as it entered. The center- 
passing ray serves as an obvious reference axis. 

Of course, the entire surface of the drop facing the sun receives 
rays, which we can assume are parallel rays because the sun is so very 
far away. There will be rays incident on the drop above the reference 
ray, and rays incident on the drop below the reference ray. From 
figure 5.9, which shows an incident ray above the center-passing 
reference ray, it is evident that such rays will emerge from below the 
reference ray. By symmetry, then, incident rays below the reference 
ray will emerge above the reference ray. An observer on the ground 
will therefore see rays of light emerging from the bottom portion of a 
drop due to incident rays illuminating the upper portion of the drop. 

The observed rays come out of the drop with various values for the 
angle a, but not all values are equally likely. This is easy to see if we 
simply plot @ as a function of where the incident ray strikes the drop. 
The geometry of this is illustrated in figure 5.10, which shows the 
center-passing ray as the horizontal (for ease in drawing) reference 
axis, and a typical ray incident on the drop above the reference ray. 
We can calculate the angle of incidence, 6;, as 


sin@)=2, O<y<R, 
R 
where R is the radius of the spherical drop and y is the vertical 
displacement of the incident ray from the reference ray. The reason 
for formulating the mathematics this way is because, for a drop in 
uniform sunlight, there are no preferred values for y. That is, in loose 
probabilistic jargon, of all the rays striking the drop, a randomly 
selected ray is as likely to have one value of y as any other (this is 
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FIGURE 5.10. Illuminating a raindrop. 


not true for 6;; 6; is not uniformly distributed from 0° to 90° for a 
spherical drop in uniform sunlight). 
From Snell’s law, we have 


1 1 
= sin! | sina)| = sin”! |" I. 
n n R 


o = 46, — 26, =4sin-!(2-)-—2sin-' (2),  o<ye<R. 
nkR R 


and so 


Figure 5.11 shows a plot of the angle a as y/R varies from 0 to 1 (the 
actual value of R is, then, for our simple analysis here, unimportant), 
using the value of n = 4/3 for a water drop in air, and it is obvious 
that a has a maximum at about 42°. (Aha!—the rainbow angle | 
mentioned at the start of this section. You are almost at the point 
of understanding the physical significance of this angle.) This is 
interesting, yes, but what makes it really interesting is that it is 
a broad maximum, i.e., there is a concentration of rays with a- 
angles at and around 42°. For example, 20% of the emergent light 
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FIGURE 5.11. The primary rainbow. 


(0.75 < y/R < 0.95) has an qa-angle in the narrow interval from 
40° to 42°. The other 80% of the emergent light is (more or less) 
uniformly distributed over the much larger a-angle interval of 0° 
to 40°. 

We can calculate the precise value of the maximum a directly, 
using calculus. Since a = 2(26, — 6;) = 46, — 26;, then 


Then, differentiating Snell’s law (using the chain rule) with respect 
to 6;, i.e., differentiating sin(6;) = nsin(6,), we get 
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dé, 
do; 


1 
cos(6;) = ncos(6,) = 5 ncos(6,). 
Thus, 
2 1 42 
cos‘ (6;) = ri n° cos’ (6,), 
and since cos*(6,) = 1 — sin’(6,), then 
cos?(0;) = = n? [1 — sin*(6,)] = t | 1 — — sin2@,) 
l 4 r 4 n2 l ) 
or 
4cos*(6;) = n? — sin?(6,). 
Since sin?(6;) = 1 — cos?(6;), this becomes 
4cos?(6;) =n* —1+ cos” (6;), 


or, when @ is maximum, 6; is given by 
i — COS 3 : 


] 
a = 40, — 20; = 4sin™' F sin(@) as20 53 
n 


Now, as before, 


and so, finally, 


a — Asin! _ sin 4 cos”! ae —2cos! ured 
max n 3 3 ° 


For n = 4/3, this reduces to 


Qa — 4sin7! ot cos”! : tl —2cos7! ha 
a 4 3) 3 3V 3 


= 42.03°, 
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just as shown in figure 5.11, and as had been known by direct 
observation of rainbows for centuries before Descartes. 

The crucial observation, of a broad maximum for a, is due to 
Descartes, who discovered the concentration of emergent rays ata = 
42° by a laborious tracing of many incident rays (using Snell’s law) 
through a single gigantic, artificial drop (a glass spherical globe). As 
he wrote in Les Météores (“Meteorology”), another appendix to his 
1637 Discours de la Méthode: 


I took my pen and calculated in detail all the rays which fall on 
the various points of a drop of water, in order to see under what 
angles they could come toward our eyes after two refractions and 
one or two reflections. I found that after one reflection and two 
refractions, very many more of them can be seen under the angle 
of 41° to 42° than under any lesser one; and that none of them 
can be seen under a larger angle. [This gives the primary rainbow.] 
Then I also found that after two reflections and two refractions, 
very many more of them come toward the eye under a 51° to 52° 
angle, than under any larger one; and no such rays come under 
a lesser [angle]. [This gives the secondary rainbow, as you’ll see 
soon. | 


The concentration of light rays around the extrema of a is, of 
course, in the very nature of an extrema; i.e., rays with a-angles on 
either side of @max have nearly equal a-angles. We can now under- 
stand the first central question about rainbows—why do they appear 
as circular arcs in the sky? Geometry, alone, answers that. Figure 5.8 
shows just a single raindrop scattering a ray of light into the eyes of 
an observer on the ground. That drop, the observer, and the parallel 
rays of light incident on the drop, are all shown in the same plane 
(the plane defined by the page the figure is printed on). In the ac- 
tual world, however, there are infinitely many planes that contain 
the observer, raindrops, and parallel light rays incident on those 
raindrops. Those drops also reflect light rays into the eyes of the 
observer because, as a little mental imagery should convince you, 
all of the geometry of figures 5.8 and 5.9 are preserved if we rotate 
those figures around the anti-solar line. That is, all of the raindrops 
scattering light back into the observer’s eyes lie on the surface of a 
cone with a central angle of about 84° (angular radius of 42°) and 
the anti-solar line as its axis; the light entering the observer’s eyes 
appears to come from a circular arc. 
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But notice, too, that the distance of any particular drop from the 
observer’s eyes is not important; all of the drops on the cone’s sur- 
face scatter light rays back down to the observer’s eyes, and so the 
rainbow is not in any particular place in the sky. The drops can be 
mere inches away, as well as many miles distant. And notice, too, 
that “the” cone is different for different observers. That means each 
observer receives scattered light from different sets of raindrops and 
so, while different observers see a rainbow, it is not the same rain- 
bow. Indeed, each eye of a Jone observer “sees” a different rainbow. 
The rainbow, then, as befits its ethereal beauty, is literally nowhere 
in particular and everywhere in general; if it is anywhere, it is in 
“the eye of the beholder”! 

The second central question—why is the rainbow multicolored? 
—requires more than geometry, which is why the answer escaped 
Descartes and all those before him. The explanation comes from 
the fact that n is not a constant; the value n = 4/3 (= 1.333) I used 
to compute amax is simply a typical value of the refractive index of 
water in the visible portion of the spectrum. As mentioned at the 
end of the previous chapter, n depends on the frequency (color) 
of the light rays, with n = 1.344 for violet and n = 1.331 for 
red (the extreme ends of the visible spectrum). There will therefore 
be a different value for am,, for each color, and the various colors 
will appear as distinctly separate but adjacent rainbows. If you run 
the red and violet values for n through the equation for amax, the 
numbers work out to be 


max (for red) = 42.37° 


Qmax (for violet) = 40.5°. 


Since Q@max(red) > Gmax(violet), the red rainbow appears higher in 
the sky than does the violet rainbow, and so red and violet are 
predicted to define the outer and inner edge colors of the rainbow, 
respectively—just as is observed. 

Now, what of those light rays inside the drop shown in figure 5.9 
that do not exit the drop after just one internal reflection, but rather 
after two such reflections, and then enter the eyes of an observer 
on the ground? Such a light ray is shown in figure 5.12, which 
illustrates the fact that, for the exiting ray to be directed downward 
to earth, the incident ray must arrive at the bottom portion of 
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emergent ray Z — 180° — 26, 


first internal 
reflection 


normal to drop surface 


FIGURE 5.12. Detailed geometrical origin of the secondary rainbow. 


the drop and emerge from the upper portion. This is precisely the 
opposite of what is depicted in figure 5.9, which gives rise to the 
primary rainbow. The situation shown in figure 5.12 will give us, 
instead, the secondary rainbow. Figure 5.12 again defines the angles a 
and D as, respectively, the angle between the incident and emergent 
rays, and the total angular deviation experienced by the light ray 
from when it enters the drop until it leaves the drop. From figure 
5.12 it is clear that now D = 180° +a, i.e., 


a = D — 180°, 


whereas in figure 5.9 (for a single internal reflection) we had a = 
180° — D. 

To find a, which is the “up-angle” from the anti-solar line to the 
(secondary) rainbow, I'll first find D and then subtract 180°. From 
the geometry of figure 5.12, we see that when the incident ray enters 
the drop by refraction it suffers an initial deviation of 0; — 6,, and 
then at each internal reflection it undergoes an additional deviation 
of 180° — 26,. Finally, at the second refraction that produces the 
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emergent ray, there is a final deviation, again, of 6; — 6,. (Indeed, 
a look back at figure 5.9 shows we could have calculated D for the 
primary rainbow in this manner rather than the way actually used.) 
Thus, with two internal reflections, we have 


D = (6; — 6,) + 2(180° — 26,) + (6; — 4) 
= 360° + 20; — 66,. 


And so 
a = D— 180° = 180° + 26; — 66,, 
or 
a = 180° — 2(36, — 6,). 


Now, just as we did before, let’s imagine a drop in uniform sun- 
light and plot @ as a function of y/R, where y is (again) the vertical 
displacement of the incident ray from a horizontal center-passing 
reference ray, and R is the radius of the drop. And, as before, from 
Snell’s law we have 


and so 
1 
a= 180° 2[3sin-! |. 2) — sin! {2 eo ee 
n R R 


Figure 5.13 shows the result for n = 4/3, with a now exhibiting a 
minimum (rather than the maximum we got for the primary rain- 
bow) at about 52°, just as reported by Descartes. Thus, we have the 
secondary rainbow at about 10° higher in the sky than the primary 
rainbow, which is just as observed when the secondary rainbow 
can, in fact, be observed (it is, of course, much less bright than the 
primary—only about 43% as bright—because of the additional loss 
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FiGuRE 5.13. The secondary rainbow. 


of intensity from the further adventures the light rays experience 
within the water drops). Again, if we examine a as a function of n 
(color), we find that amin is different for different colors, but now 
there is (literally) a new twist—the color sequence in the secondary 
rainbow is the reverse of the primary rainbow. That is, the red rain- 
bow will appear lower in the sky than does the violet rainbow, and 
so red is predicted to be the inner edge color (and violet the outer 
edge color) of the secondary rainbow. And that is precisely what 
is seen. 


The secondary rainbow has occasionally appeared in fic- 
tional literature. In her episodic novel Strange Attractors (Viking 
1993), for example, Rebecca Goldstein ends her story with a 
description of a group of mathematicians running outdoors to 
observe a double rainbow. Her words are lovely to read, but 
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flawed ever so slightly by positioning the secondary rainbow 
in the wrong part of the sky: 


And outside the mathematicians all stand gathered together on 
the wet lawn, staring up into the western sky, where there’s a 
rare double rainbow stretching itself: The colors of the primary 
arc are intense and beneath [my emphasis] is the secondary 
rainbow, with its paler inversion of the spectrum. And all of 
the mathematicians are standing together in silence; on every 
face the same look of transfixed bliss. 


As was Descartes’ practice, he failed (for whatever reason) to ac- 
knowledge the prior work of others into the nature of the rainbow. 
In fact, the 42° angle of the primary rainbow can be found in the 
Opus Majus (1267) of the English philosopher and Franciscan friar 
Roger Bacon. And it was only a few decades later, in 1304, that 
the German monk Dietrich von Freiberg (1250-1310) advanced the 
correct explanation for the rainbow as the scattering of light by in- 
dividual raindrops. It is in his writing, too, that we find the conclu- 
sion that each observer sees a personal rainbow from different sets 
of drops. And not only that, it was Theodoric of Freiberg (as he is 
generally called in the English literature) who was the first to exper- 
iment with water-filled transparent containers, as artificial drops, to 
trace the paths of light rays. And not only that, it was Theodoric 
of Freiberg, not Descartes, who was the first to associate the pri- 
mary rainbow with two refractions and one internal reflection, and 
the secondary rainbow with two internal refractions and two reflec- 
tions. His small book De iride et radialibus impressionibus (“On the 
Rainbow and ‘Radiant Impressions’ ”) put forth all of these funda- 
mental ideas, but not a word about any of it appears in Les Météores. 
To give Descartes his due, however (which is more than he did for his 
predecessors), discovery of the concentration of rays at the observed 
rainbow angle is Descartes’ alone. 

The primary (Secondary) rainbow is the result of two refractions 
and one (two) internal reflection(s). Wouldn’t three internal reflec- 
tions therefore produce yet another rainbow (the so-called tertiary 
rainbow)? And why stop there—what of the possibility of rain- 
bows produced after N internal reflections, where AN is any positive 
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integer? Such high-order rainbows would, of course, be expected to 
be even dimmer, but perhaps sufficiently sensitive eyes could see 
them—if they exist. The question of higher-order rainbows, partic- 
ularly the tertiary, intrigued many people over the centuries, and 
they searched the sky for them. The logical place to look would seem 
to be in the sky above the secondary, which itself is 10° above the 
primary. Despite occasional claims to have seen the third-order rain- 
bow, however, nobody ever has seen it, and nobody ever will—even 
though it surely does exist! Calculus explains this apparent para- 
dox, with a calculation first done by Newton, probably some time 
around 1670. 

An easy extension of the analysis just done for D, the angle of 
total deviation experienced by a light ray from when it first arrives 
on the surface of a drop until it exits the drop, leads to the result 


D = (6; — 6,) + N(180° — 26,) + (6; — 4,) 
= 2(6; — 8,) + N(180° — 26,) 


if there are N internal reflections (for the particular case of the 
secondary rainbow, we used N = 2). Notice that for N = 1, the 
case of the primary rainbow, this expression reduces to D = 180° + 
20; — 46,, which is, indeed, the result arrived at in the discussion at 
the start of this section. As was the case for the first two rainbows, 
all rainbows occur at the extrema of the angle D. So, differentiation 
of D with respect to 6; gives 


dD dO, dO, 
— =2-2 — 2N—., 
dé; dé; d6; 


which, when set equal to Zero, says 


dé, ] 


dé. 1+N 


when D (for the Nth order rainbow) is at its extrema (which is 
what gives rise to the concentration of observed light rays, i.e., the 
rainbow). 

Remembering the result we calculated earlier from a differentia- 
tion of Snell’s law, 
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dé, 
6; = 0, “ue a 
cos(6;) = ncos(@,) 16, 


we therefore have 


dé, 1 cos(6;) _ 1] 


d6; n cos(0,)  1+N’ 


Cross-multiplication and squaring gives 
(N + 1)? cos?(6;) = n” cos”(6,). 
From trigonometry and Snell’s law, we have 
cos?(6,) = 1 — sin?(6,) = 1— —; sin?(@)), 
and so 
(N + 1)* cos?(6;) = n? — sin?(6;) = n? — [1 — cos?(6;)], 
Or 
(N + 1)? cos?(6;) =n? — 1 + cos?(6;). 


This is easily solved for cos(6;) to give Newton’s equation for the 
condition that must be satisfied for the Nth order rainbow: 


As a check, notice that for the case of N = 1 (the primary rainbow) 
this does reduce correctly to the result calculated earlier: cos(6;) = 
J (n2 — 1)/3. 

So, where is the tertiary rainbow? Inserting N = 3 (and using 
n = 4/3), we have 

2 
—} —-1 
4 <4 


(3)(5) 3V 15’ 
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or 6; = 76.84°. And so, from Snell’s law, 
ee Oy 
sin(@,) = — sin(6;) = — sin(76.84°), 
n 4 
or 
-  —| 3 : fe) fe) 
6, = sin 1 sin(76.84°) } = 46.91°. 
Thus, 


D = 2(76.84° — 46.91°) + 3[180° — 2(46.91°)] 
= 318.4". 
To understand what this value means, take a look at figure 5.14, 


which shows the case of N = 3 internal reflections. It is clear from 
the geometry there that the ray enters through the bottom portion 


FIGURE 5.14. Detailed geometrical origin of the tertiary rainbow. 
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of the drop, bounces once nearly all around the inside of the drop, 
exits the bottom portion of the drop, and so is scattered downward 
and forward out of the drop. That is, for an observer on the ground 
to see the scattered ray, she must turn around and look behind her! 
The tertiary rainbow is indeed “there” (if there are raindrops in the 
skies between the sun and the observer), and it is indeed higher 
in the sky than is the secondary. It is actually higher than straight 
up. Until Newton’s calculation, people had been looking forward 
with the sun behind them, just above the secondary rainbow, and 
that’s simply the wrong place to look. But even if somebody had 
turned around, they still wouldn’t have seen the tertiary rainbow 
because, in addition to its inherent dimness (the tertiary is only 
about 24% as bright as the primary), it is completely overwhelmed 
by the nearby glare of the sun, as shown in the following box. 
And that’s why nobody ever will see the natural tertiary rainbow. 
Artificially produced rainbows, generated in the laboratory with a 
laser playing the role of the sun, have let experimenters actually see 
rainbows up to at least N = 20. They are right where Newton’s boxed 
equation for cos(6;) says they should be. 


The tertiary rainbow has one last surprise for us—its shape. 
Celebrity intellectual Marilyn vos Savant stumbled on this 
point when replying to a reader’s question on where the third- 
order rainbow is. In her Parade Magazine column “Ask Marilyn” 
(August 4, 2002), she wrote that the tertiary rainbow “arches 
over [my emphasis] the second |i.e., secondary] one.” This is 
not so, and here’s why. 

Just as the primary and secondary rainbows are rotationally 
symmetric around the anti-solar line (the observer is facing 
away from the sun), the tertiary rainbow is rotationally sym- 
metric around the solar line (the line from the sun to the ob- 
server who is now facing the sun). As shown in figure 5.15, 
the scattered ray from a typical raindrop that is between the 
sun and the observer makes an angle of (about) 41.6° with 
respect to the solar line. Because of the rotational symmetry, 
then, scattered light rays reach the observer from raindrops on 
the surface of a cone (with the solar line as the axis) with an 
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FIGURE 5.15. Locating the tertiary rainbow in the sky. 


angular radius of 41.6°. That is, the tertiary rainbow is a circular 
halo around the sun! 


The “discovery” of the tertiary’s place in the sky has an interest- 
ing history. After Newton was appointed in late 1669, at age 26, to 
the Lucasian Professorship of Mathematics at Cambridge (the chair 
now held by the famous theoretical physicist Stephen Hawking), he 
gave a Series of inaugural lectures during the period 1670-72. Those 
lectures were not published at the time (they are available to the 
modern reader in The Optical Papers of Isaac Newton, edited by Alan 
E. Shapiro, Cambridge University Press 1984), but they did serve as 
the basis for his 1704 book Opticks. In his optical lectures, Newton 
discussed the rainbow, including calculations concerning rainbows 
beyond the secondary. It isn’t entirely clear if he used his general 
results to actually calculate the specific angular position of the ter- 
tiary rainbow (he certainly didn’t publish it), and in Opticks he wrote 
only that “The light which passes through a drop of rain after two 
refractions, and three or more reflections, is scarcely strong enough 
to cause a sensible bow.” There is no mention of the glare of the 
sun overwhelming the tertiary rainbow halo. Johann Bernoulli later 
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reproduced Newton’s general approach, but he too failed to specif- 
ically calculate the location of the tertiary. Like Newton, Bernoulli 
made only a single suggestive comment, to the effect that while the 
tertiary rainbow might be visible to eagles or lynxes, it would not be 
visible to human eyes. Where in the heavens is home to the tertiary 
rainbow was finally specifically published in 1700, in the Philosoph- 
ical Transactions of the Royal Society. The author was Edmond Halley, 
Newton’s friend and the cannon-shooter extraordinaire from earlier 
in this chapter. 

And, finally, to end this chapter on a cosmological note, what will 
rainbows look like in the very far future? This question is not quite as 
odd as you might think—when the sun is vastly older than it is now, 
it will be much less hot, and its spectrum will be predominantly at 
the longer (infrared) wavelengths. Will there still be a rainbow to 
be “seen”? There will be, indeed, but only seen by (nonhuman?) 
eyes that have adapted to the shifted spectrum. We know this is 
so, because there is an infrared rainbow in the sky right now, and 
it has been photographed. You can read how that was done in the 
article by Robert Greenlear, “Infrared Rainbow” (Science, September 
24, 1971, pp. 1231-32). Professor Greenlear ends on a poetic note, 
writing of his “fascination in ‘seeing’ for the first time an infrared 
rainbow which has hung in the sky undetected since before the 
presence of man on this planet.” And he found it right where all 
the math theory of this chapter says it should be. 


Solution to Steiner’s Problem in Section 5.1 


With the Steiner function written as 
f(x) = a _ eln(x'/*) = ox In) = 8), 
we have 
1 
g(x) = — In(x). 
x 


Now, from our result in section 4.5 on how to differentiate a 
composite function, 
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(continued) 
df [da ,) fds) ,f1 1 
dx | =e { “ E x? nts) 
or 
df xx 
7S = ll — In(x)]. 


Since x!/*/x* > 0 for all x > 0, then f’(x) = 0 only when 
1 — In(x) = 0, i.e., when x =e. 

We could now use the second derivative test to show that 
x =e gives a maximum, but we can see this more directly by 
simply observing that 


fd)=1 
fQ=2'" >= 70) 


f3)=3'7 > FQ) 
f(4 =4'4 = 27)'4 = 2! = £2) < FQ). 


That is, 


fl) < f(2) < FG) > fA), 


and so we expect f(x) to have a maximum at some x between 
2 and 4; notice that e = 2.718... satisfies that requirement. 
(Do you see why f(3) = 3! > f(2) = 2!/2? Just raise both 
quantities to the sixth power and observe that f°(3) = 3* =9 
and f°(2) = 2? = 8. Since 9 > 8, then f(3) > f(2).) Thus, the 
maximum value of f(x) is 


f(e) =e!/* = 1.444667861..., 


often called Steiner’s number. 


6. 


Beyond Calculus 


6.1 Galileo’s Problem 


The story of Galileo Galilei (1564-1642), and of his research into 
the physics of free-fall by dropping various weights from the top 
of the Leaning Tower of Pisa, is too well known to be retold here. 
Whatever the truth of the details of that story, it is undeniable that 
the Italian astronomer was deeply interested in how things move 
under the influence of gravity. It was that interest that eventually 
led to what is generally thought to be the first solved problem in 
the calculus of variations, which was the next great step beyond the 
calculus of Newton and Leibniz in solving minimization problems. 
Galileo’s own attempt at the original version of the problem was one 
of mixed success and, indeed, one that still prompts some debate 
among historians. 

Galileo did the work that set the stage for the ultimate version 
of the so-called “minimum descent time” problem during the fi- 
nal, most troubled years of his life, troubles caused by his belief in 
Copernicanism. Copernicanism teaches that all the planets (includ- 
ing Earth) orbit the sun, not a stationary Earth. In direct contra- 
diction with Biblical scripture, such a belief was bound to lead to a 
collision with the Church. After the 1632 publication of his book 
Dialogue Concerning the Two Chief World Systems, in which he ad- 
vocated positions in conflict with religious teachings, Galileo was 
summoned to Rome in 1633 on the charge of suspicion of heresy. 
He was sick in bed when summoned, and so he declined to make 
the journey. He perhaps first realized how precarious was his posi- 
tion when the Pope (a friend of many years!) threatened to forcibly 
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transfer him to Rome, in chains, if he continued to refuse. So he 
went, but the “trial” was a farce, with an outcome no one could 
doubt. 

His very life hung in the balance, and he was lucky to get off with 
“only” the placing of the Dialogue on the Index (of forbidden books), 
a prohibition against ever publishing again, being forced to recant, 
and imprisonment (later commuted to house arrest, with surveil- 
lance, for life). Although now very sick and nearly blind, Galileo 
proved to be tougher than the religious thugs of the Inquisition; 
he used his cruel confinement to write one more book, Discourses 
and Mathematical Demonstrations Concerning Two New Sciences. It was 
smuggled out of Italy and published in Holland in 1638, just as 
Descartes and Fermat were doing battle in France over Snell’s law. 
Galileo’s new, groundbreaking ideas on how things fall in gravity 
were described in that final work. 

Imagine a bead with a wire threaded through a hole in it, such 
that the bead can slide (with no friction) along the wire. Suppose 
that the wire is bent into the shape of a circular arc with radius L, 
and positioned vertically. The bead is held at point D, as shown in 
figure 6.1, so that the radius to the bead makes angle @ with the 


initial position of bead 


ne circular wire, of radius L 


FiGuRE 6.1. A bead sliding under gravity along a vertical, circular wire. 
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vertical. We then release the bead, which slides to the bottom of the 
wire at point C. That is, the bead makes a circular descent under the 
influence of gravity. An immediate and natural question to ask is, 
how long does the descent take? It was far beyond the mathematics 
of Galileo’s day to compute the precise answer, and his approach to 
the problem is via ingenious geometrical constructions. Today we 
can compute the answer (see appendix E for the details): if T is the 
descent time, and g denotes the acceleration of gravity, then 


IL? dB (1 
e— al SSS = k= sin( a , 
& J0 \/ 1 — k? sin?(B) 2 


an expression that would have been meaningless to Galileo (or, for 
that matter, to any other mathematician of the first half of the sev- 
enteenth century). Instead, Galileo used inclined planes as approx- 
imations to a circular arc to calculate approximations to the time of 
descent. What I’ll show you here is a modern treatment of Galileo’s 
ideas, although his development was strictly geometric (and very 
subtle). You can find the original geometric approach discussed in 
the paper by Herman Erlichson, “Galileo’s Work on Swiftest Descent 
from a Circle and How He Almost Proved the Circle Itself Was the 
Minimum Time Path” (American Mathematical Monthly, April 1998, 
pp. 338-47). 

As Professor Erlichson pointed out in an earlier paper [“Galileo’s 
Pendulums and Planes” (Annals of Science, May 1994, pp. 263-72)], 
the original motivation for Galileo’s interest in the question of the 
descent time along a vertical circular path came from his interest in 
pendulum motion; a light fixture hanging from a chain attached to 
the ceiling of a church executes a circular swing when disturbed by 
an earthquake. The period of such a swing (the time for one complete 
swing, from the starting point of the fixture back to the starting 
point) would thus be given by 47, a value Galileo incorrectly be- 
lieved to be independent of a (the amplitude of the swing). Galileo 
was wrong but, actually, not by very much. 

The first, crudest approximation to circular descent, using straight 
line segments (or inclined planes, as Galileo thought of the approx- 
imations), would be descent along the direct, single segment con- 
necting D and C. The next, somewhat less crude approximation 


BEYOND CALCULUS 203 


FiGurE 6.2. Galileo’s approximation to a circular wire. 


would use the broken line descent along two inclined planes (D to 
B, then B to C), as shown in figure 6.2. The arc DBC is, as drawn in 
that figure, one-quarter of a circle of radius L, centered on M. Point 
B is arbitrary, with the radius from M to B making angle 0; with 
the radius from M to D (if 6; = 0° then B = D, and if 0; = 90° 
then B = C). What I'll do next is derive Tp and Tz, the times for the 
bead (starting from rest) to slide under gravity from D to C along 
the Direct path and along the Broken path, respectively. 

The calculation of Tp is easy, once you observe that the bead’s 
speed during the descent increases linearly from vp = 0 at D to 
vc = J/2gL at C. The linear part follows from the fact that, along 
the entire, direct path from D to C, the acceleration of the bead 
by gravity is constant. The expression for uc follows from simply 
equating the change in the bead’s kinetic energy of motion from 
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D to C to the change in its potential energy of position (since we 
are assuming zero friction, then conservation of energy holds). So, 
if the bead has mass m, 


l 2 
5 mvc = mgL, 


and so, as claimed, 


vc = V22L. 


The average speed of the descent is then given by 


Uc + Up ] 
= -/2gL. 
7 § 


5 = 


The length of the direct path from D to C is obviously 


JL? 4+ L?2 = LV2, 


and so 


LV2 L 


—/22L 
7 § 


As shown in appendix E, if a = 90°, then the time for true circular 
descent on the quarter circle is T = 1.8541./L/g, and so Tp is less 
than 8% longer than T, i.e., 


, ft 
T V 
oP eS 787. 
L 
1.8541. /= 
g 


Galileo didn’t know this, but he did know one astonishing fact about 
Tp—it is independent of the position of D. In the above discussion, 
I took D as at the top end of a quarter-circular arc, but if we instead 
let the radius from M to D be at some (arbitrary) angle 0 below the 
full quarter circle (see figure 6.3) we’ll find the descent time remains 
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FIGURE 6.3. Time of descent is independent of D. 


unchanged. This is, I think, not at all obvious, but it is not hard to 
demonstrate. 

Since we now have the bead’s initial position, D, decreased verti- 
cally by the amount h = L sin(@), then the vertical drop of the bead 
during its descent is L — Lsin(@). Thus, its speed at C is 


vc = V2gL{1 — sin(@)} 


and its average speed during the descent is SUC; just as before. The 
length of the direct path is now 


1 
24=2L sin( 45 — 58) : 
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and so the time of descent (7g) is (in our notation, Tp = 2./L/g is 
the special case of Tg~o°) 


2L sin( 45 — 59] fE sin (45 — ; 6) 
Ci 
: 2eL {1 — sin(6)} S Wa 


OT 


l 
sin( 45 — 56] 
| One Be ee ieee ce 
: = /1 — sin(@) 


This looks complicated but, in fact, the quantity in the braces 
equals one for all 6! This is so because, from the trigonometric addi- 
tion identity for the sine, 


sin (45 — 56] sin(45°) cos( 5) — cos(45°) sin( 50) 
0 I fp 


/ 1 — sin(@) / 1 — sin(@) 
i = i ; Ly li ; ng 
_ Pane ) Fa sin( 5 ) _ cos( 5 } = sin( 5 ) 
1 — sin(6) J1 — sin(@) ) 


If we square this last expression and then use the trigonometric 
identity sin(~) cos(B) = + {sin(a — B) + sin(a + B)}, we get 


cos? Il, — 2cos I, sin iy + sin? I, 
2: 2 p 2 


1 — sin(@) 
i in| ao Vesa eo 
AE ED sin) 
7 1 — sin(@)  J-sin@) 


and so Ty = Tp, for any 6, not just for 6 = 0°. Amazing! 
Let’s next calculate Tg, the descent time along the broken path 
DBC in figure 6.2. As before, vp = 0 at D. To get to B, the bead falls 
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through a vertical distance of L sin(6,) and so vg = /2gL sin(6;). 
And, as before, at the end of the descent, vc = /2gL. Also as before, 
since the accelerations on DB and BC are constant (although not 
equal), then the speed of the bead along each segment increases 
linearly from its initial speed to its final speed on each segment. So, 
the average speed on DB is 5 / 2¢L sin(9,), and the average speed on 


BC is 5 {/2eL + /2gL sin, )}. The lengths of the two segments are 
DB = 2£, = 2Lsin($6;) and BC = 2€) = 2L sin(562) and thus, the 
time of descent, along the two-segment broken path from D to C, is 


l l 
ZL sin 56) 2L sin( 6 
2 2 
Tz; = ———_—. +-- ———— $e 


SJ2gLsin@;) 4 {/2gL + /2gLsin(6;)} 


_ {1 fi 
OT in( 561) 21 sin( 56) 


g ) /sin(6; ) ear os : 


L sin sin( 
= 2/2 2/2 
jel) /sin@) i (t-ee 1+ wols) 


or, at last, 


fn — 0; ) 
sin( sin 5 
= 2/2 /- 
(39) 


1 + /sin(6;) 
We can compare Tz to Tp by studying their ratio as a function of 
6, 1.e., 
7 (5%) 7 (45 61] 
sin{ —0, sin — —6; 
vg 
Tp /sin(9} ) 1 + ./sin(@;) 


A plot of R is given in figure 6.4, which shows that R < 1 for all 
6, in the interval 0° to 90°, which means the bead always takes 
less time to descend along a broken path (even though it is the 
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FIGURE 6.4. The two-segment broken line is almost as fast as the circle! 


longer path) than along the direct path. (It is geometrically obvious 
that the broken path is longer than the direct path.) Only when 
6, = 0° or 90° is R = 1, which is geometrically obvious since for 
both cases the broken path degenerates into the direct path. The 
plot shows that the time of descent is minimized when 6, is around 
25°, although it is not a sharp minimum. A careful examination 
of the plot shows that the minimum value of R is 0.9313, i.e., at 
the minimum Tg = 0.9313Tp. Since Tp = 1.0787T, then at the 
minimum of figure 6.4, we have Tg = (0.9313)(1.0787)T = 1.00467. 
With just a two-segment approximation to the circle, then, we can 
have a descent time less than 5 of 1% greater than the circular 
descent time. 

With this result in hand, Tg < Tp, Galileo then made his first 
mistake. He argued that the double-broken path (DBEC, shown in 
figure 6.5) would have a descent time even shorter than Tz. This 
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FiGuRE 6.5. Galileo’s mistake. 


conclusion is correct, but his reasoning was not. He argued that, in 
terms of time, the single-broken path DBC is such that 


DC > DB + BC, 
and that the single-broken path BEC is such that 
BC > BE + EC. 


The first statement is of course true (we derived it!), but the second 
does not follow from our analysis because in the first analysis we 
assumed that the initial velocity is zero (as it is at D). But the initial 
velocity is not zero at B. By continuing to add more and more break 
points along the circular arc, Galileo concluded that the fastest way 
from D to C was along the circle itself, which is true (but, again, his 
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reasoning was faulty). What some historians think he meant was that 
this is so if all the break points must be on the circle. Others think he 
meant the circle was the fastest descent curve of all possible curves 
from D to C. In fact, it is not, as the next section will demonstrate. 


6.2. The Brachistochrone Problem 


Once Galileo’s original problem had focused attention on the gen- 
eral problem of gravitational descent, it was a natural question to 
then ask what is the curve of swiftest descent? Mathematicians of 
the caliber of the Bernoulli brothers, Newton, and Leibniz knew that 
Galileo’s analysis had not established that it is a circular arc. What 
if, they asked, the broken-line approximation to the descent curve 
was no longer constrained to have all of its endpoints on a circular 
arc—perhaps then there could be an even “faster” curve. 

It was this problem, of determining what is called the brachis- 
tochrone, that Johann Bernoulli posed “to the most acute mathemati- 
cians of the entire world” in June 1696. (The name comes from the 
Greek brachistos (shortest) and chronos (time) and is due to Bernoulli. 
Leibniz preferred tachystoptote, from tachystos (swiftest) and piptein 
(to fall), but deferred to Bernoulli.) Notice that this is not a problem 
of “ordinary” calculus, where what is asked for is the particular value 
of a variable that minimizes a function of that variable. Rather, we 
are now to find the function (i.e., a particular entire curve) that mini- 
mizes some other function (the so-called functional) whose indepen- 
dent “variable” takes on “values” from the set of all possible curves 
connecting two given points (in the brachistochrone problem, the 
“other function” is the descent time). This is an entirely new sort of 
minimization problem, and its solution initiated a new branch of 
mathematics—the calculus of variations. 

Bernoulli’s challenge to find the brachistochrone was accepted 
by some of the great mathematical minds of the day, but it was 
Bernoulli’s own original solution that was the most beautiful and 
compelling, using a brilliant application of Fermat’s principle of 
least time and Snell’s law. In a 1697 letter, Bernoulli claimed to have 
had, however, no prior knowledge of Galileo’s work on gravitational 
descent, and perhaps he was being honest. It strikes me as most un- 
likely, however, that Bernoulli could really have been so unaware— 
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it wasn’t as if Galileo had published his circular descent analysis in 
some obscure journal. Discourses was a famous book! In addition, it 
is known that Johann Bernoulli had an extraordinarily jealous na- 
ture, and hated to share credit in mathematical work. We’ve already 
seen that side of him in the affair over who really wrote de L’ Hospi- 
tal’s calculus book, and it was on display again in a later, very ugly 
business with his own son, Daniel, an accomplished mathematician 
in his own right. Daniel’s important book Hydrodynamica was pub- 
lished in 1738, just as his father’s similarly titled book Hydraulica 
was being published. Rather than being proud of his son, Johann 
claimed he had priority, even though he knew Daniel had actually 
finished his writing several years earlier. If Johann would deny his 
own son honest credit, then it is difficult to believe he would worry 
much about denying the long-dead Galileo any credit for motivat- 
ing the brachistochrone problem. 

Still, while Johann Bernoulli apparently had a serious problem 
with intellectual honesty, it cannot be denied he was a genius. His 
solution for the brachistochrone would alone insure his mathe- 
matical fame. Here’s how he did it. From Snell’s law, as correctly 
explained by Fermat’s invoking of the principle of least time (see 
section 4.6), we have 

sin(6) ) _ sin(6@>) 


= constant 
V1 v2 


for a light ray traveling in the two mediums from B to A (speed 
v, and v2 in the upper and lower mediums, respectively), shown in 
figure 6.6. That figure is similar to figure 4.10 (here I have written 
6, and 6, for 6; and 6,, respectively), where it was understood that 
v2 < vj (the upper medium, 1, is less dense than the lower medium, 
2, as would be the case for medium 1 as air and medium 2 as water). 
We could, however, simply reverse the path of the ray to get figure 
6.7, which is just figure 6.6 flipped over. Snell’s law is still true for 
Figure 6.7, of course, just as written above. 

Now, imagine that instead of just the two mediums of figure 6.7, 
there are a great many layered mediums, each less dense than the 
layer above it. Then the light ray’s speed increases as it penetrates 
the layers in the downward direction, and the ray bends ever more 
away from the vertical, as does the ray path illustrated in figure 6.8. 
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FIGURE 6.6. Snell’s refraction geometry, again. 


As we let the number of layers increase (and the thickness of each 
layer decrease) without bound, the path becomes a smooth curve, 
and at every point along this curve we will have 


sin(@) 
= constant. 


Bernoulli’s brilliant insight into how to solve the minimum-descent- 
time problem was to turn the above argument on its head. That is, 
if the above condition is the result of assuming minimum travel (de- 
scent) time (Fermat’s principle of least time for light), then starting 
with the above condition should result in the curve of minimum 
descent time. 

Therefore, as shown in figure 6.9, I have sketched the curve of 
minimum descent time (whatever it is!) from B (the origin) to A, 
with @ as the angle between the tangent at an arbitrary point (x, y) 
on the curve and the vertical. Notice that the positive y-axis points 
downward because we are studying a falling bead. At the arbitrary 
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FiGureE 6.7. Snell’s refraction geometry, again (flipped). 
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FiGurE 6.8. Layered approximation to a variable density optical medium. 
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FIGURE 6.9. Geometry of Bernoulli’s solution. 


point (x, y) the speed of the descending bead along the curve is v. 
If we assume, as in Galileo’s original analysis, that the bead starts 
its descent from B with zero initial speed, then conservation of 
energy says (for a bead with mass m) that the loss of potential energy 
(mgy) equals the gain in kinetic energy (5 mv’), and so, after falling 
through a vertical distance of y, the speed of the bead is 


v= /2gy. 


So, Bernoulli’s ingenious approach to the brachistochrone problem 
is “simply” to imagine that the “speed of light” in a variable-density 
optical medium is ./2gy and to find the path a ray of light will 
follow, because light takes the least-time path. This solution could 
only have occurred to a mind equally at home with mathematics 
and physics. Mathematical skill alone would not have been enough. 
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As de L’Hospital wrote to Bernoulli in a letter dated June 15, 1696, 
“This problem [of minimum descent time] seems to be one of the 
most curious and beautiful that has ever been proposed, and I would 
very much like to apply my efforts to it, but for this it would be 
necessary that you reduce it to pure mathematics, since physics 
bothers me.” 

From the geometry of figure 6.9 it is clear that 


] ] ] 
seca) /T + tan2(a) dy\? 
i+ (2) 
dx 


_ ] 


- Tito 


sin(@) = cos(a) = 


Therefore, 


] 
sin(6) JI +0’ 
: wow, 


= constant = 
JV 2gy 


Squaring the second equality gives 
2ey [1+ (y’)?] = constant, 


or, finally, with C a constant, we arrive at the (nonlinear) differential 
equation for the curve of minimum descent time: 


dy ; 
I+ (— = C. 
Nonlinear differential equations are generally not easy to solve an- 
alytically (with each new one requiring, it seems, its own unique 
“trick”), but we can solve this one for y in the following way. Taking 
advantage of Leibniz’s notational advantage over that of Newton’s, 


and treating the differentials dx and dy as algebraic quantities, we 
can solve for dx to get 
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Next, making the change of variable to ¢ (notice that g = 0 when 
y = 0), where 


y sin(@) 
tan(y) = ./|——— = 
C-—y  cos(¢) 
we have 
y sin?(¢) 


C—y  cos2(g)’ 
y cos*(y) = C sin*(y) — ysin’(g), 
y cos’(g) + ysin?(y) = y = C sin*(g). 
Differentiation of the last equality with respect to gives 


d 
ay = 2C sin(¢~) cos(9), 
dy 


and so dy = 2C sin(@) cos(g)dqg, which says 


dx = 2C sin(¢) cos(¢) andy = 2C sin(¢) cos(@) tan(g)d@. 
y 


Or, as cos(¢g) tan(g) = sin(@), we have (using a trigonometric double- 
angle identity) 


dx = 2C sin?(y)dy = C[1 — cos(29)] dg. 


This last expression we can integrate by inspection: with C; as 
the constant of indefinite integration, we arrive at 


sin(2¢~) | 
CSC a a C= 5 C[2@ — sin(2@~)] + Cj. 
We can determine the value of C; by inserting the coordinates of the 
point B at which the descent begins, that is, the origin x = y = 0. 


Or, equivalently, x = g = 0. Then, C; is obviously zero and so 


ba 5CI29 — sin(2¢)]. 
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Earlier we also found that y = C sin?(yv) = C[1 — cos*(p)] and so, 
again from a trigonometric double-angle identity, we have 


l 
5: 5 Cll — cos(2¢)]. 
As our final step, to make the equations as simple-appearing as possi- 
ble, I’ll replace the constant 5C with simply a, and make the change 


of variable 6 = 2g. Then, at last, we arrive at the so-called parametric 
equations for the minimum-descent-time curve, or brachistochrone: 


x = a[B — sin(B)] 


y = all —cos(B)] 


This result greatly surprised Bernoulli, who recognized these equa- 
tions as describing a previously known (for at least a century) curve, 
the cycloid (a name coined by Galileo in 1599), which is the curve 
traced by a point (Starting at the origin) on the circumference of 
a wheel, with radius a, rolling without slipping along the x-axis. 
Although it seems incredible that the cycloid could have been over- 
looked by the ancient mathematicians, it appears that the first time 
it was discussed in print was in 1501, in the work of the French math- 
ematician Charles Bouvelles (1470-1553). You can find more discus- 
sion in the paper by E. A. Whitman, “Some Historical Notes on the 
Cycloid” (American Mathematical Monthly, May 1943, pp. 309-15). 

The cycloid equations do not directly connect x and y, but rather 
link them together via the parameter 6. We can thus simply vary 
6, calculate x and y for each of many different values of 6, and 
arrive at the x, y plot of the cycloid. Figure 6.10 shows such a plot, 
for a = 1, in the interval 0 < B < 2z. It should be clear that a is 
simply a scale factor and, as we make a smaller or larger, the curve 
shrinks or inflates, respectively. We can make the cycloid, starting 
at the origin, pass through any given point (x > 0, y > 0) by simply 
picking the constant a properly (start with a = 0 and then increase 
it, i.e., “inflate” the cycloid, until it passes through the given point). 
This should make it obvious, too, that there is a unique value of a that 
does this. Thus, the brachistochrone joining two points is unique, 
and it is an inverted section of the arch of a cycloid. 
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FiGuRE 6.10. The cycloid a = 1. 


You should not think of the parametric representation of 
a curve as being something less than desirable, as somehow 
being less useful than a direct expression of y in terms of x. We 
will not find ourselves at any disadvantage with a parametric 
representation. For example, if we want to know the slope of 
the cycloid at some point, we simply use the chain rule to 
calculate 


2 ae ee 2) 
dx dB dx dB/ dB 1 —cos(B) 


There can be occasions, in fact, where the parametric repre- 
sentation is the only proper way to formulate a problem. For 
example, in appendix F you'll find the derivation of an expres- 
sion for the area inside a closed, non-self-intersecting curve: 
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if the parametric equations of the curve C are x = x(t) and 
y = y(t), then 


1 f' ( dx dy 
losed by C = = — —x— ]}dt, 
area enclosed by 5 | (> Aa, ) 


where C is imagined to be the clockwise path traversed by a 
moving point, starting at time ¢ = 0 at some place and return- 
ing to that initial place at time ¢ = T. This result will be crucial 
to the solution of the ancient isoperimetric problem discussed 
in chapter 2 (what figure of given perimeter encloses the max- 
imum area?), and which we will finally be able to do in this 
chapter. 


Johann Bernoulli’s brother Jacob (1654-1705), Leibniz, and New- 
ton also submitted solutions in response to Johann’s challenge. 
Bernoulli’s challenge to Newton, in particular, was not really a 
friendly one. Bernoulli had taken Leibniz’s side in the dispute over 
who was the “true” discoverer of the calculus, and he meant to em- 
barrass Newton by showing that he was unable to solve a problem 
that both Bernoulli and Leibniz had already solved. As Bernoulli 
stated in the public announcement of the brachistochrone prob- 
lem, “so few have appeared to solve our extraordinary problem, 
even among those who boast that through special methods, which 
they commend so highly, they have not only penetrated the deepest 
secrets of geometry but also extended its boundaries in marvelous 
fashion; although their golden theorems which they imagine were 
known to no one, have been published by others long before.” 

Newton was not amused by this; as he later stated, “I do not 
love to be dunned and teased by foreigners about Mathematical 
things.” Newton quickly set about answering Bernoulli’s challenge 
and, according to second-hand accounts, solved the problem in a 
single night using a then unknown method (but see the box in 
section 6.4). Newton’s “solution,” however, is simply a description 
for how to construct the minimum-descent-time cycloid, with no 
explanation for how he arrived at that curve as the brachistochrone. 
The construction was published anonymously in the Philosophical 
Transactions of the Royal Society of January 1697 (backdated by his 
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editor/friend Edmond Halley, as Newton actually first read aloud his 
“solution” at a meeting of the Royal Society on February 24, 1697). 

A famous story about the anonymous publication is that, after 
reading it, Johann claimed he knew the unnamed author was New- 
ton because he “recognized the lion by his paw.” For once in his 
life Johann Bernoulli, despite his bias against Newton, was gracious 
to a competing mathematician working on the same problem, per- 
haps because in this case Bernoulli clearly had priority. [However, for 
a more sympathetic view of Johann Bernoulli’s relationships with 
competing mathematicians see the old but still valuable paper by 
Constantin Caratheodory, “The Beginning of Research in the Cal- 
culus of Variations” (Osiris, 1938, pp. 224—40)]. 

The brachistochrone has a second remarkable property, in ad- 
dition to being the curve of minimum descent time. In 1656 the 
Dutch mathematical physicist Christiaan Huygens (1629-95) con- 
structed the first successful pendulum clock, which he knew had 
a period slightly dependent on the amplitude of the pendulum 
swing. To achieve complete independence, i.e., to invent the so- 
called isochronous pendulum clock, Huygens inserted curved metal 
surfaces at the suspension point on each side of the flexible cord 
that (along with a weight at the end) served as the pendulum. These 
surfaces forced the pendulum cord to deviate from being straight 
as it swung back and forth, in just such a way as to make the pe- 
riod independent of the amplitude of the swing. In his 1673 mas- 
terpiece, Horologium Oscillatorium (The Pendulum Clock), Huygens 
showed that the curved constraint surfaces should be cycloidal arcs 
(he had actually Known this since the end of 1659). That would force 
the swinging weight to follow a cycloidal path (a mathematician 
would say that Huygens had discovered that the involute of a cy- 
cloid is another cycloid), which was known to be isochronous, i.e., a 
bead undergoing gravitational descent along a cycloidal curve takes 
the same time to reach the bottom of the curve, no matter where it 
Starts its descent (this is shown in the next section). This means the 
brachistochrone is also a tautochrone (from the Greek tauto, the same, 
and of course, chronous, time), a discovery that so pleased Huygens 
he said it was “the most fortunate finding which ever befell me.” 
In actual practice, however, the friction between the curved metal 
surfaces and the pendulum cord resulted in a bigger source of time- 
Keeping error than was the original amplitude-period dependency. 
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As I mentioned earlier, Bernoulli was astonished to learn the 
brachistochrone is a cycloid, and so, when he revealed his deriva- 
tion in January 1697, he first discussed Huygen’s cycloid and its 
tautochronous property and then stated, “you will be petrified with 
astonishment when I say that precisely this same cycloid .. . is our 
required brachistochrone. .. . Nature always tends to act in the sim- 
plest way [certainly Bernoulli would say this, since he had used Fer- 
mat’s principle of least time in arriving at his solution], and so it 
here lets one curve serve two different functions.” 


6.3. Comparing Galileo and Bernoulli 


Now that we have the analytic form of the true minimum-descent- 
time curve, the next natural question to ask is how much faster is it 
than Galileo’s circular descent curve? We found in section 6.1 that, 
on a quarter circle of radius L, it takes the time 7 for the bead to 


make the descent, where 
L 
T = 1.8541./ —. 
& 


Galileo didn’t actually calculate this result, but he came close to 
it, and so I’ll now write T as Tg. What we want to calculate now is Tz, 
the time to fall along the brachistochrone curve from (0, 0) to (L,L). 
(Note carefully: this Tg is not the Tg of section 6.1!) Everything we’ve 
done so far tells us Tg < Tg. Let’s see by how much. 

If we define s as the distance from the origin to the arbitrary point 
(x, y) on the descent curve, as measured along the curve, then, as 
argued before from the conservation of energy, we have 


ds 


=—_- = 2 : 
dt SY 


Uv 


where, from the Pythagorean theorem, we have the differential arc 
length ds along the curve as ds = ,/(dx)* + (dy). Thus, 


dy s 
_ Vary _“V'* 3 
ee en 


dt 
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dy\’ dy\’ 

dx, Ji+(“) ax fi+(2 
P| +) y+) 
v J2gy 


and so 


Integrating, where, as ¢ goes from 0 to Tg we have x go from 0 to L, 


Because we already have the equations relating y and x (the para- 
metric equations of the cycloid), we can now directly evaluate this 
integral, as I’ll do next. But first, notice that we have arrived at this 
integral (the so-called functional) without using our knowledge of 
the specific relationship between y and x. Indeed, the general ap- 
proach of the calculus of variations (which we’ll take up in the next 
section) does not require that knowledge, but instead derives the 
brachistochrone by determining the function y(x) that minimizes 
the time functional. For now, however, let’s evaluate Tg directly. 

From the boxed parametric equations for the brachistochrone 
given in the previous section, we have 


~ = all — cos(B) 
iB a cos(B 
dy, 

dp =a sin(B). 


We have B = 0 when x = 0 from the definition of 8, and let’s further 
suppose that B = B when x = L. Then, 


1+ (2 
4 [ (dx)? + (dy)? 
i= 4 dx= | | —.~——— 
2gy 0 2gy 
a*[1 — cos(B)P +a? sin?(B) | _ 6 /2a[1 —cos(B)] 
2ga [1 —cos(B)] ~ Jo -Y 2gall — cos(B)] 


lI 
D> 
Cg | & 
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To find B (which certainly must be greater than zero) and a, we 
use the fact that the brachistochrone ends at (L,L)—remember, as 
in figure 6.9, we are thinking of the positive y-axis as increasing 
downward. Thus, 


L =alB —sin(A)] 
L =al[1 —cos()], 
and so 


_ i _ i 
7 B — sin(B) 7 | — cos(B) 


The second equality is equivalent to solving the equation 


f(B) = B+ cos(B) — sin(B) — 1 = 0. 


A plot of f(B) is shown in figure 6.11, which tells us there is just one 
positive solution to f(8) = 0. Using the Newton-Raphson iterative 
method discussed in section 4.5, it is easy to calculate that solution 
to be 8 = 2.412 radians. Thus, 


L L 
° ~~ 1 c0s(2.412) 2.412 — sin(2.412) 


10.5729 L IL 
Tp = 2.412 | ————— = 1.8257 |: 
g g 


which is, indeed, less than 7g. But only by about 1.5%. Galileo’s 
quarter-circle is pretty close to being the brachistochrone. 

We can show the isochronous property of the cycloid as follows. 
For a cycloid starting at (0,0) and ending at the very bottom of the 
cycloidal path, we have (from before) 


='0:3729'L; 


and so 


where now B is the value of B at the bottom. (For the brachis- 
tochrone joining (0,0) to (L, L), the problem we just analyzed, the 
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0.2 


0.1 


B + cos(B) — sin(B) — 1 


f(B) 


0 0.5 1.0 1.5 2.0 2.5 
B (in radians) 


FIGURE 6.11. Estimating 6 when x = L. 


point (L, L) is not the bottom of the cycloidal path). From the para- 
metric equations of the cycloid, we see that this means 8 = m: at 
the bottom, x = za and y = 2a (take a look again at figure 6.10). 
Thus, the time required for a bead to slide from top to bottom is 


a 
T=) 
g 


If the fall along the cycloid does not start at (0,0), however, but 
rather at some lower point (xo, yo) on the cycloid, then the speed of 
the descending bead, at the general point (x, y), is 


v= /2g(y — yo), 


and so the time to reach the bottom is now given by 
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The isochronous property discovered by Huygens says T’ = T. 


Here’s why. 


Inserting dx and dy in terms of 8, as we did before, and changing 
the integration limits to the appropriate values for 6B (let B = Bo at 


(xo, Yo)), we have 


(dx)? + (dy)? 
2g(y — yo) 


= [ __a°{l = cos(Byy’ +a? sin"(B) 
7 2e[{a — acos(B)} — {a — acos(Bo)}] 


2ag cos(Bo) — 2ag cos(B) 


7 f= fe 1 — cos(B) ip 
~ Ve Js, Y cos(Bo) — cos(B) 
From the half-angle trigonometric identity 


fil _ /1 — cos(B) 
sin( 58) = eo 


we then have 


_ visin( 58) a) 
Pale | se ea JeastBo) — costB) © 


And from the half-angle identity 


(; ) 1 + cos(B) 
cos{ < B | = ,{/ ——_—_, 
2 2 


we have cos(8) = 2cos*(4 6) — 1, and so 


se Visin( 5 p) 
a 2 
r= f= f operas (1) 
§ 0 l 1 
2 2 


dp 
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ote) 
Ved Ea) oa 


If we now change the integration variable to 


then 


fil 
du 7 sin( 5 6) 
es at 
oP 2e0s( 5 po) 


and so the T” integral becomes 


=2/2 [ du 
8 Jo JI — u2- 


From integral tables, we find this integral is sin~'(u), and so 


T’=2 /- = {sin-"), —2 e {sin7'(1) — sin7'(0)} 
a 
== 7 —O0 = — = T, 
Fig-=0/! 


as Claimed. 
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With a knowledge of the chain rule in differentiation, we 
can actually derive the above integral easily, with no need for 
tables. In figure 6.12, I’ve drawn a right triangle such that the 
angle ¢ is given by sin(y) = u, i.e., g = sin"! (u). Thus, 

dp 


a sin! (u) 
— u) = —. 
du du 


From the chain rule, and the figure, 
d d d 
— sin(g) = cos(y) =vVl- nes. 
du du du 


But of course we also have 


d in(o) du | 
—— Sin =—- — =], 
du ss du 
Thus, 
Pie i 
du’ du fee 


1—u? 


FiGurE 6.12. Differentiating the inverse sine function. 
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and so 
1 
J1— uz 


Integrating both sides then immediately gives us 


a —) 
— Sin — 
du ‘ 


= sin !(u). 


ie 


Bernoulli was certainly correct in saying the cycloid has a fasci- 
nation all of its own, even for nonmathematicians. Its isochronous 
property, for example, received attention in, of all places, a famous 
work of fiction, Herman Melville’s 1851 classic whaling story Moby- 
Dick. In chapter 96 (“The Try-Works”), where the book’s narrator (re- 
member him?—“Call me Ishmael”) is describing how the try-pots of 
the ship Pequod are cleaned (a try-pot is an enormous iron cauldron 
used to reduce whale blubber to liquid oil), we read the following 
passage: 


...an American whaler is outwardly distinguished by her try- 
works. ... The try-works are planted between the foremast and 
main-mast, the most roomy part of the deck. The timbers beneath 
are of a peculiar strength, fitted to sustain the weight of an almost 
solid mass of brick and mortar, some ten feet by eight square, and 
five in height. The foundation does not penetrate the deck, but 
the masonry is firmly secured to the surface by ponderous knees 
of iron bracing it on all sides, and screwing it down to the timbers. 
On the flanks it is cased with wood, and at top completely covered 
by a large, sloping, battened hatchway. Removing this hatch we 
expose the great try-pots, two in number, and each of several bar- 
rels’ capacity. When not in use, they are kept remarkably clean. 
Sometimes they are polished with soapstone and sand, till they 
shine within like silver punch bowls. During the night watches 
some cynical old sailors will crawl into them and coil themselves 
away there for a nap. While employed in polishing them—one 
man in each pot, side by side—many confidential communica- 
tions are carried on, over the iron lips. It is a place also for profound 
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mathematical meditation. It was in the left hand try-pot of the Pequod, 
with the soapstone diligently circling around me, that I was first indi- 
rectly struck by the remarkable fact, that in geometry all bodies gliding 
along the cycloid, my soapstone for example, will descend from any 
point in precisely the same time [my emphasis]. 


The problem of determining the curve of swiftest descent is not 
simply one of historical interest. It has reappeared over the centuries 
in various forms, right up to modern times (it represents the ulti- 
mate in fast roller coaster rides, for example, especially right at the 
Start with a vertical drop!), and it continues to capture the imagina- 
tion. For example, in 1966, Paul W. Cooper, an industrial mathe- 
matician, published a short note in the American Journal of Physics 
(“Through the Earth in Forty Minutes,” January, pp. 68-70). In his 
paper, Cooper pointed out that the gravitational field of the interior 
of the Earth would allow “falling through” a frictionless, straight 
tunnel connecting any two points on the surface of the planet in the 
same time interval of 42.2 minutes. Cooper imagined “a transporta- 
tion system without timetables wherein the world’s cities are linked 
with chords and where the departure time is universally on the hour 
and arrival time forty-two minutes later. Such a chord link between 
Boston and Washington, D.C., would involve a maximum penetra- 
tion of about 50 miles below the Earth’s surface.” [For a somewhat 
more realistic tunnel transportation system, see the earlier paper by 
L. K. Edwards, “High-Speed Tube Transportation” (Scientific Ameri- 
can, August 1965, pp. 30-40).] 

Straight tunnels do not define the fastest travel times, however, 
and Cooper also wrote “One might want, in fact, the actual minimal- 
time path. . . . This is more complex than the classic brachistochrone 
problem in that here [by here Cooper means inside the Earth] the 
gravitational field is radial instead of rectangular, and it is not uni- 
form.” The brachistochrone tunnel connecting Los Angeles and 
New York City, for example, has a travel time of just 28 minutes—but 
it comes with a high price. The curved tunnel dips 1,000 miles below 
the surface! See J. E. Prussing, “Brachistochrone-tautochrone Prob- 
lem in a Homogeneous Sphere” (American Journal of Physics, March 
1976, pp. 304-S). 

Cooper’s paper on straight tunnels was noticed by Time magazine 
(February 11, 1966, pp. 42-43) and given a science fiction flavor 
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for its popular audience, which prompted a number of physicists to 
write the AJP to say the whole idea was actually an old idea (see the 
replies in the August 1966 issue of the American Journal of Physics, 
pp. 701-4). Indeed, one writer traced it back to a paper delivered 
to the French Association for the Advancement of Science, in 1883! 
That paper may well have been the inspiration for Lewis Carroll, 
who used the concept in chapter 7 of his novel Sylvie and Bruno 
Concluded ten years later. 

Three years after Cooper’s paper appeared, a brief solution to the 
brachistochrone problem inside the Earth was given in the American 
Mathematical Monthly (“Fast Tunnels through the Earth,” June-July 
1969, pp. 708-9). That solution uses the modern calculus of varia- 
tions approach. Twelve years after that, P. K. Aravind, a chemist (!) 
at the University of California/Santa Barbara, showed how to use 
Bernoulli’s original optical analogy approach to solve the interior 
problem (“Simplified Approach to Brachistochrone Problems,” 
American Journal of Physics, September 1981, pp. 884-86). And fi- 
nally, the brachistochrone problem can be solved in closed form 
even if the additional complication of friction is included (we've ig- 
nored that important reality in all that we’ve done in this chapter). 
It is not a trivial exercise, however, and I’ll simply refer you to the 
paper by N. Ashby et al., “Brachistochrone with Coulomb Friction,” 
American Journal of Physics, October 1975, pp. 902-6. 


In this section we saw, for the first time, the formula for the 
length of a curve traced out by a moving point. If that motion 
is described by x = x(t) and y = y(t), then the path length 
traveled over the time interval t = O tot = T is 


An interesting extrema problem uses this result: if Tiger Woods 
wants to hit a golf ball for maximum range, then we showed 
in section 5.4 that he should drive the ball off of its tee at a 45° 
angle. But suppose instead that he wants to drive the ball for 
maximum distance through space, i.e., for maximum trajectory 
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length? Then the angle is not 45°, but rather the not-so-obvious 
56.466°. You can find it all worked out in the paper by Ze-Li 
Dou and Susan G. Staples, “Maximizing the Arclength in the 
Cannonball Problem” (The College Mathematics Journal, January 
1999, pp. 44-45). 


6.4 The Euler-Lagrange Equation 


The brachistochrone problem is generally accepted by historians as 
marking the beginning of the calculus of variations. This, despite 
the fact that Bernoulli’s solution, using Fermat’s principle of least 
time and Snell’s law, does not use the methods of that yet to be de- 
veloped subject. The reason for this is because it was quickly under- 
stood by all that, while undeniably brilliant, Bernoulli’s solution by 
optical analogy was too specialized, with no hope of being extended 
to other such questions, e.g., to the ancient isoperimetric question, 
discussed in chapter 2, of what closed curve, of given length, en- 
closes the maximum area? What was needed was a general theory 
to attack such problems, and the brachistochrone problem itself, not 
Bernoulli’s particular solution of it, was the spark that initiated the 
search for that theory. 

Sometimes one does read of an earlier problem that is said to actu- 
ally be the first such problem in the calculus of variations, but its his- 
tory is a murky one indeed. This is the question, briefly mentioned 
in Newton’s Principia (in 1687, nine years before Bernoulli’s brachis- 
tochrone challenge), of what solid of revolution would experience 
the least resistance to motion through a medium with certain phys- 
ical properties (e.g., a ship’s hull in water)? As did his later brachis- 
tochrone “solution,” Newton’s answer to the minimum-resistance 
problem appeared in the original Latin printing of the Principia as 
just that; an answer with no derivation (as the Scholium to Propo- 
sition 34 of Book 2). This has led some modern writers to conclude 
(oddly and without justification, in my opinion) that Newton had 
no proofs! See, for example, the paper by Robert Weinstock, “Isaac 
Newton: Credit Where Credit Won’t Do,” and the replies to it, in 
The College Mathematics Journal (May 1994, pp. 179-222). Neither 
Professor Weinstock nor his critics seem to be aware of the fact that 
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when the English translation of the Principia appeared in 1729, the 
minimum-resistance solid was treated analytically (those calculus- 
based arguments were included without attribution, but it was es- 
tablished in 1888 that they were, indeed, from Newton—see the 
following box). 


The analytical treatment appearing in the 1729 edition of 
the Principia of the minimum resistance solid, discovered in 
Newton’s papers in 1888, was prompted by a request he re- 
ceived from a reader of the original Latin printing of the Prin- 
cipia. Newton replied with the requested analysis in a letter 
dated July 14, 1694, to David Gregory (1659-1708), two years 
before Bernoulli’s challenge; the method used is easily extended 
to the brachistochrone problem. Gregory, a Scot who ended his 
career as the Savilian Professor of Astronomy at Oxford, had a 
reputation as a not very outstanding mathematician. His entry 
in the Dictionary of Scientific Biography tells us, for example, that 
“the impression gained from his printed work [is] that a mod- 
icum of talent, effectively lacking originality, was stretched a 
long way.” The last paragraph of that same entry also makes 
it clear, however, that Gregory has all math historians in his 
debt: “In retrospect, Gregory’s true role in the development of 
seventeenth-century science was not that of original innova- 
tor but that of custodian of certain precious papers and verbal 
communications passed to him. . . as privileged information, 
by Newton.” 

It seems clear (to me, at least) that Newton simply thought 
both the minimum-resistance-solid problem and the minimum- 
descent-time curve problem to be interesting but not worthy 
of lengthy elaboration. This remarkable pair of decisions si- 
multaneously illustrates both his monumental genius as well 
as an even more monumental mistake in judgment! Without 
Gregory’s request, we might well never have learned the de- 
tails of Newton’s solutions. You can find discussions of New- 
ton’s “missing” solutions in H. W. Turnbull, The Mathematical 
Discoveries of Newton (Blackie & Son 1945, pp. 39-42), in Her- 
man H. Goldstine, A History of the Calculus of Variations from the 
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17th through the 19th Century (Springer-Verlag 1980, pp. 7-29), 
and in I. Bernard Cohen, “Isaac Newton, the Calculus of Vari- 
ations, and the Design of Ships,” in For Dirk Struik (D. Reidel 
1974, pp. 169-87). 


As with Bernoulli’s solution to the brachistochrone problem, 
Newton’s solution to the minimum-resistance solid (using a method 
easily extended to the brachistochrone problem, which means he 
surely did derive the cycloid solution) is not the proper basis for at- 
tacking other functional problems in general. The development of a 
general theory began with the Swiss genius Leonhard Euler (1707- 
83), a student of Bernoulli who went on to exceed his mentor. 

What I'll do next, then, is derive the basis for just such a general 
approach to these problems, the so-called Euler-Lagrange equation. 
The presentation that follows is the modern one found in textbooks 
today, and is fairly close to the way it was first done by the French- 
Italian mathematical-physicist Joseph Louis Lagrange (1736-1813). 
The equation was known to Euler by 1736, but today it is univer- 
sally derived in the same way that Lagrange did it (in a 1755 letter 
to Euler when Lagrange was, yes, just nineteen!), which Euler en- 
thusiastically adopted as the superior approach. Lagrange’s use of a 
“variational” technique prompted Euler to coin the name calculus of 
variations. For the historically minded, excellent papers to read for 
the detailed history are G. A. Bliss, “The Evolution of Problems of the 
Calculus of Variations” (American Mathematical Monthly, December 
1936, pp. 598-609) and Craig G. Fraser, “Isoperimetric Problems in 
the Variational Calculus of Euler and Lagrange” (Historia Mathemat- 
ica, February 1992, pp. 4-23). 

The simplest form of our general, fundamental problem is easy 
to state: find the function y(x) that minimizes the integral (or func- 
tional) 


J -|/ F {x, y(x), y'(x)} dx, 


] 


where x; and x2 are given, the function F is given, and y’(x) = 
d/dx y(x). Many of the classical problems of the calculus of varia- 
tions can be put in this form. For example, recall from the previous 
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section the expression for Tg, the descent time along the curve 
y = y(x) from (0,0) to (L,L): 


; -[ (EE ODE» 
—_ 
0 2gy 


We were able, there, to directly evaluate the minimum descent time 
(the minimum of the integral) because we already knew from other 
considerations what the equations for the minimum-descent-time 
curve are (the brachistochrone is a cycloid). Soon, however, we will 
redo this problem by a direct minimization of the integral. In this 
particular case, we have 


; l 1+ (y’)? 
eed ate 


In this case, as will be true all through this chapter, F is a known 
function of x, y, and y’, but y is not a known function of x. Indeed, 
that is our problem: what is y = y(x) to minimize J? In this book 
we are mostly interested in the pioneering problems of extrema, and 
the above integral J is almost all we need to consider. I say almost 
because I will eventually make two extensions to the above problem 
statement, but I’ll save them for later. So, let’s begin. 

In figure 6.13, I’ve drawn the curve defined by y = y(x), in the 
interval xj < x < x2, where we will take that y(x) to be the actual 
solution curve we are after. Around it, as the dashed curve, is 


Y(x) = y(x) + eux), 


where ¢€ is any constant and w(x) is an almost arbitrary (but always 
differentiable, as we’ll also assume y(x) to be) function. I say almost 
because we will put two constraints on (x); it must vanish at the 
endpoints, i.e., w(x;) = (x2) = 0. You'll see why, soon, this is a 
desirable property for w(x). Notice, too, that Y(x) = y(x) if e = 0, 
which will be useful to remember in just a bit. That is, Y(x) is a 
perturbed version of the solution y(x), and ey(x) is the variation of 
Y(x) around y(x). 

Because J will, in general, depend on the value of €, we can 
write our formulation of the general problem as: find the y(x) that 
minimizes the integral 
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Y(x) = y(x) + EL(X) 


FiGureE 6.13. A true solution and a variation around it. 


J(e) = [ F{x, Y(x), Y'(x)} dx, 


x| 


where 


¥ (x) = yx) + eu (x) 
Y'(x) = y'(x) + eu'(x). 


J = J(e€), since Y and Y’ depend on ¢€. Now, since we have intention- 
ally constructed this formulation so that by definition Y (x) collapses 
to the solution y(x) when é = 0, then the J(e) integral is minimized 
(by definition!) when ¢ = 0. Thus, it must be true that 


dJj 
dé \c=0 7 


9 


because this is the necessary (although of course not sufficient) con- 
dition for an extrema (e.g., a minimum) to exist. The distinction 
between an extrema being a maximum or a minimum will gener- 
ally be obvious from the physics of the particular problem we will 
be studying. 
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To proceed to our next step, I now need to use a result in calculus 
called Leibniz’s rule for differentiating an integral. In the simple case 
we have, where the integration limits are not functions of ¢, that rule 
reduces to the intuitively appealing result that the derivative of the 
integral is the integral of the derivative [the general rule, which is a 
bit more complicated, is nicely discussed by Marc Frantz, “Visualiz- 
ing Leibniz’s Rule” (Mathematics Magazine, April 2001, pp. 143-45)], 
and so 


dj d {” “2 OF 
—=—|]  F{x,¥(x),Y(x)}dx = | — dx, 
ee [ {x, Y(x), Y (x)} dx [ a 


where 0 F/de denotes the partial derivative of F. The partial refers to 
the fact that F is a function of variables other than just ¢, i.e., x, Y, 
and Y’. 

Using the chain rule, we can write 0F'/de in terms of those other 
variables as 


OF OF aY | OF ay” | oF Ox 
de OY Oe OY’ de Ox OE 


Since we have 


oY (x) oY’ (x) and Ox 0 
—— X)s = X), pe 
O€ a O€ a O€ 


then 


‘ [ eye tea 
== — Xx XxX. 
def, lay’ ay” 


Since setting « = 0 is equivalent to setting Y = y and Y’ = y’, we 
therefore have 


dJ 
dé 


0 ia OF opie iu enh 7 
— = — X X XxX. 
e=0 x] rs Ay 


To continue we next need to recall yet another result from cal- 
culus, the formula for integrating-by-parts that was developed in 
section 5.1: if g(x) and h(x) are two functions of x (in chapter 5, I 
used u(x) in place of g(x) and f(x) in place of h(x)), then 
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x2 x XxX? 
| gdh = (8h -| h dg. 
xy x1 xX] 


We can use this to integrate the second term of our expression for 
dJ /de. To do this, first set 


OF 
g(x)=—, dh=p'(x) dx. 
dy’ 


Then, 
d d (OF d (OF 
ec meee (a or dg = —|—] dx 
dx dx \dy’ dx \dy’ 
and 
h(x) = w(x). 
Thus, 


[ oF " \d @ es " [ ee d (—) d 
— w(x) dx = | — w(x — x)— | — 
x, Oy’ ie dy’ ‘a x _ dx \dy’ 


However, since (x;) = (x2) = 0, we have 


OF a 
(Fae 


and now you can see how convenient was that earlier stipulation 
that u(x) vanish at both x = x; and x = x2! This, then, lets us write 


-0-['|* oo = (=) I 4 
ee ens rate ‘ 
= [#155 - aelay) 4 
— a a dy dx \dy’ : 


We are now at the final step in deriving the Euler-Lagrange equa- 
tion, a step so “obvious” to Lagrange that he zipped right through 


it. Later mathematicians thought this final step to be not quite so 
obvious, and so provided proofs for it, but I’ll follow Lagrange and 


— 0, 


X2 
x] 


dJ 
dé 
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simply state the following so-called fundamental lemma of the calcu- 
lus of variations (which I think is plausible): 


X2 
if, for arbitrary u(x), / u(x)H(x)dx = 0, 


x] 


then H(x) = 0, Xp <X < XX. 


For us, this means 


OF ad (OF _¢ 
dy dx\dy') — 


which I’ve put in a box because it is the famous Euler-Lagrange 
differential equation. Now, what do we do with it? That’s what the 
rest of this chapter is about. 


6.5 The Straight Line and the Brachistochrone 


For our first application of the Euler-Lagrange equation, let’s prove 
that the curve of minimum length connecting two given points in 
a plane is a straight line. Recall that the differential length ds along 
the curve y = y(x) is 


2 
ds = /(dx)* + (dy)* =,/1 + (2) dx = /1+(y’)* dx. 


The total length of the curve connecting the points (x), y;) and 
(x2, y2) is, therefore, 


X2 X? 
=f as = [ JV1+(y’)? dx, 
xX] x] 


which means we have 


— 1+ =[14 0}. 


Since F has no explicit dependence on y, we have 
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and we also see that 


a N21 4) y’ 
Gl Oy Sea, 
ya 1+’? 


Inserting these two results into the Euler-Lagrange equation gives 


d y’ =, 
dx | /1+(y') 
which immediately tells us that 


/ 


» 


Vvl+’) 


And this, in turn, immediately says that y’(x) = constant, which 
means y(x) = mx +b, where m and b are constants. This is, of course, 
the equation for a straight line, with m and b selected to make the 
line pass through the given points (x;, y;) and (x2, y2). So, at last, 
we have mathematical proof of what we all knew all along! This is 
nonetheless an important result, helping to build our confidence in 
the Euler-Lagrange equation. 

As a further confidence builder, let’s solve another problem to 
which we also already know the answer. Recall from section 6.3 the 
expression for the time required by a bead to slide under gravity 
(and no friction) along the curve y = y(x) from (0,0) to (L,L): 


= constant. 


2 
1+ ay 
(2) | [ 1+ (y’) 
ett pdx, 
2gy V/28 Jo y 


Thus, ignoring the constant factor of 1/,/2g, we have 


N\2 1/2 
p= {70") | 
y 
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Unlike the previous example for the straight line, this F has an 
explicit dependence on y. This is a complication, but, on the other 
hand, notice as well that this F does not explicitly depend on x. In 
this case, then, we can use a “reduced” form of the Euler-Lagrange 
equation, derived in 1868 by the Italian mathematician Eugenio 
Beltrami (1835-1900): 


OF oF 
if — =O _ then F — y’— =constant, 
Ox dy’ 


a result called Beltrami’s identity (derived in appendix G). 
Substituting the F for the descent-time integral into Beltrami’s 
identity gives (with K some constant), 


Ca ae bee ae 


K. 
y 2 y y 


Or, with just a little bit of algebra, 


y[1+(0’)*] = rae 


which, replacing the constant 1/K* with C, becomes 


dy : 
l — = ee 
| (2) | 
But this is precisely the differential equation for y = y(x) that 
was derived in section 6.2 for the brachistochrone using Bernoulli’s 


optical analogy. This means that, once again, the Euler-Lagrange 
equation has given us the correct answer. 


6.6 Galileo’s Hanging Chain 


This chapter started with Galileo, and in this section we return to 
him once more. Imagine a given length of flexible, linked chain 
hanging under gravity from nails at each end, as shown in figure 
6.14. A modern version of the hanging chain is a telephone wire 
or power transmission line hanging from adjacent support poles 
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FIGURE 6.14. Galileo’s hanging chain. 


or towers. Our question here is easy to state: what is the shape of 
the hanging chain? Galileo’s answer, in his Discourses of 1638, was 
equally plainspoken; a parabola. Since the Latin word for chain is 
catena, the shape of the hanging chain was said (by Huygens in a 
1690 letter to Leibniz) to be a catenary, and Galileo’s claim, then, 
was that the catenary is a parabolic curve. 

Galileo was, however, wrong, and it was known for quite some 
time that he was wrong. The German mathematician Joachim Jun- 
gius (1587-1657) is generally given credit for formally establishing 
this negative result in 1669. It wasn’t until 1691, however, that 
Leibniz and Johann Bernoulli actually figured out what the catenary 
is. (You'll see, in just a bit, however, how it could be known that the 
catenary is not a parabola—Huygens claimed to have known this 
since his teenage years—without knowing what it actually is.) 

We can solve Galileo’s problem without the calculus of variations 
and, of course, that is how it must have been solved in 1691, five years 
before the brachistochrone challenge and long before the develop- 
ment of the Euler-Lagrange equation. Following how Bernoulli did 
this will give us yet another solution that we can again compare to 
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what the Euler-Lagrange equation will tell us (and, of course, we will 
get the same answer). Bernoulli started by making the obvious ob- 
servation that no matter how the chain hangs, it will have a lowest 
point. Let’s call this point A, as shown in figure 6.15, and position 
the coordinate system so that A is on the y-axis. The tangent to the 
curve of the hanging chain at A is clearly horizontal. Bernoulli then 
said the chain to the left of A(x < 0) is completely represented by 
the left-directed horizontal force it exerts on the rest of the chain to 
the right of A. Let’s call that unknown force (or tension) at A, T,; 
whatever it is, it is a constant. 

Bernoulli then marked B as an arbitrary point with coordinates 
(x, y) on the chain, as shown in figure 6.15. If the mass M of the 
chain is uniformly distributed along its length L, then the mass 
density is simply p = M/L, a constant. So, if the length of the 
section of chain from A to B is s, and if as usual g is the acceleration 
of gravity, then the weight of the chain section between A and B is 
psg. Finally, Bernoulli denoted the tension in the chain at B by T, 
directed of course along the tangent to the chain’s curve at B. Call 
the angle that T makes at B, with the horizontal, 6 (as shown in 
figure 6.15). Bernoulli then used the physical observation that the 
section of chain from between A and B is not moving. Thus, the net 
force acting on the chain must be zero. In particular, the sum of the 


FIGURE 6.15. Static forces acting on a hanging chain. 
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horizontal forces, and the sum of the vertical forces, must each be 
zero. Thus, 


T, = T cos(@), (horizontal force equation) 


and 
pgs =T sin(@), (vertical force equation). 


Dividing these two equations into each other, and writing the con- 
stant T,4/og as simply k, we have 
T sin(@) _ p8s 


l 
——_—— = — = tan(@) = -s. 
T cos(@) T, k 


Since tan(@) is the slope of the catenary at B, we can write 


Differentiating both sides with respect to x gives Bernoulli’s differ- 
ential equation for the catenary: 


dy lds _1¥(@xyP+@yP _ 1], (ay) 
dx k dx} 


dx2 kdx  k 


At this point I’ll depart from Bernoulli’s calculations (which be- 
came quite complicated) and continue with a pretty little trick that 
wasn’t developed until years later (in 1712) but which will allow 
us to neatly and quickly solve Bernoulli’s differential equation for 
y = y(x). Following the lead of the Italian mathematician Jacopo 
Francesco Riccati (1676-1754), let’s define the new variable p as 

dp  d’y 


d 
es and so — 


P dx’ dx dx2 


Then, Bernoulli’s equation becomes 


d | kd 
a 
dx k J1+ p? 


which can now be easily integrated. 
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A table of indefinite integrals tells us, in fact, that 
, ee sinh~!(p) + C 


where C; is the constant of integration, and so 


=") -2 


= sinh ==: 
g ( dx 

This is even easier to integrate once more; since the derivative of the 
hyperbolic cosine is the hyperbolic sine, we can immediately write 
the equation of the catenary as 


x — C, 
y(x) = kcosh i + C), 


where C2 is another constant of integration. 

We clearly have three adjustable constants to play with here 
(remember, k has the unknown tension Ty, in it), and they can be 
determined from three conditions. Obvious candidates for those 
conditions are the coordinates of the ends of the catenary, and the 
length of the hanging chain. To be honest, however, the numerical 
work in finding those three constants can in general be a formidable 
task, and I will pursue it no further. (A short but quite interesting 
essay on how to do the number crunching is by Paul Cella, “Reexam- 
ining the Catenary,” College Mathematics Journal, November 1999, 
pp. 391-93.) 

I have written before of Johann Bernoulli’s obsessively competi- 
tive personality, and the solution of the catenary problem provides 
yet another illustration of that unpleasant aspect of his nature. Writ- 
ing a quarter of a century later of his memories of the competition 
between himself and his brother Jacob, he still gloried as much in 
his long-dead brother’s failure as in his own success. As he wrote in 
a 1718 letter to a French correspondent: 


The efforts of my brother were without success; for my part, I 
was more fortunate, for I found the skill (I say it without boasting, 
why should I conceal the truth?) to solve it in full and to reduce 
it to the rectification of the parabola. It is true that it cost me 
study that robbed me of rest for an entire night. It was much 
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for those days and for the slight age and practice I then had, but 
the next morning, filled with joy, I ran to my brother, who was 
still struggling miserably with this Gordian knot without getting 
anywhere, always thinking like Galileo that the catenary was a 
parabola. Stop! Stop! I say to him, don’t torture yourself any more 
to try to prove the identity of the catenary with the parabola, since 
it is entirely false. The parabola indeed serves in the construction 
of the catenary, but the two curves are so different that one is 
algebraic, the other is transcendental. 


(This example of Johann’s petty nastiness toward Jacob was not 
an isolated case. In an earlier 1712 letter to a mathematician in 
England, Johann called the forthcoming posthumous publication of 
Jacob’s seminal probability book Ars Conjectandi “A monster which 
bears my brother’s name.”) 

One last point. We see now that the catenary is not a parabola 
as Galileo believed because we have calculated what it actually is 
(a hyperbolic cosine). How could it have been discovered that the 
catenary is not a parabola without finding what it actually is? This 
can be done by a simple negative demonstration. That is, suppose 
we examine the curve of the suspension cable of a bridge, attached 
to a heavy bridge roadway, as shown in figure 6.16. Now we have 
a chain (the cable) hanging not by virtue of just its own mass, but 
also because of the enormously greater load of the massive bridge 


suspension cable 


-~--------> 


DT LE ELE EL IES EY AY Of EDLY Eff (ELE fe ft ————  X 


bridge deck 


FiGuRE 6.16. A hanging cable with uniform horizontal loading is a parabola. 
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deck connected to the suspension cable by a very large number of 
vertical hanger wires. In the original catenary problem, the force 
on the cable was due just to the mass of the cable itself, but in the 
suspension cable case, the cable mass is insignificant compared to 
the mass of the supported bridge deck. If we suppose the bridge deck 
has a uniform mass distribution along its length, and if we position 
the coordinate axes as shown in figure 6.16, then the equation for 
the catenary 


dy 1 

dx k 
is replaced with 

dy 

es ee 

dx ‘i 


where K is some constant. This is immediately integrable to give 


I 2 
ee +C), 


or, writing 5K = Cj, we have the parabola 
y=C, x? + Co. 


That is, assuming uniform mass density along the cable gives a cate- 
nary, while assuming uniform mass density along the x-axis gives the 
parabola. Two different assumptions, two different curves. 

The physical interpretations of the constants C; and C> for the 
suspension bridge cable’s parabolic curve are easy to see. C2 is simply 
the height of the cable’s low point (at x = 0) above the bridge 
deck. Also, if the tops of adjacent, uniform height support towers 
are at (—a, b) and (a, b), where a and b are both positive, then C; = 
(b — C2)/a?. 

As with the brachistochrone, the catenary has been mentioned 
in fictional literature. One character in Mark Helprin’s 1983 novel 
Winter’s Tale, for example, proclaims: 


A bridge is a very special thing. Haven’t you seen how delicate they 
are in relation to their size? They soar like birds; they extend and 
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embody our finest efforts; and they utilize the curve of heaven. 
When a catenary of steel a mile long is hung in the clear over a 
river, believe me, God knows... . I would go as far as to say that 
the catenary, this marvelous graceful thing, this joy of physics, this 
perfect balance between rebellion and obedience, is God’s own 
signature on earth. I think it pleases Him to see them raised. 


Beautiful words, yes, but it would of course have been better for Hel- 
prin to have clearly distinguished between the unloaded hyperbolic 
catenary and the loaded parabola as the curve of the bridge’s cable. 


6.7 The Catenary Again 


To see how the calculus of variations handles the catenary prob- 
lem, let’s return to the physical principle we used in the analysis of 
de L’Hospital’s pulley problem in section 5.7. There it was argued 
that a massive body hanging under the effect of gravity alone (as 
does Galileo’s chain—see figure 6.14 again) will hang in such a way 
as to minimize its total gravitational potential energy. For the pul- 
ley problem we needed only ordinary calculus, as the pulley was 
modeled as a single point mass and the supporting cables were taken 
as massless. For the catenary problem, however, the entire length 
of hanging chain has mass and so we have a massive, spatially dis- 
tributed body. 

To set the catenary problem in mathematics, let’s assume as before 
that the chain’s mass is uniformly distributed with constant density 
p. If we look at a differential length (ds) of the chain, located at the 
arbitrary point (x, y) on the curve y = y(x), then the differential 
mass is dm = pods and the potential energy of that differential mass 
is (ods) gy, where g is, as usual, the acceleration of gravity. Thus, the 
total potential energy of the hanging chain, which we wish to min- 
imize by finding the “right” curve y = y(x), is given by the integral 


X2 X2 
J -| pgy as = [ pay (dx)? + (dy)? 


xX] x] 
x2 dy 2 x2 

= pgyyjl+ (>) ax -| payv1+(y’)? dx. 
XY xX] 
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Complicating matters just a bit here, however, is the first of the 
two extensions I mentioned back in section 6.4—we need to min- 
imize J under the constraint of a given, fixed length of chain. That 
is, we must find that y = y(x) that minimizes J while keeping the 
chain’s length L constant: 


x2 X2 X2 
| ds = | / (dx)* + (dy)? = | JV 1+ (y’)* dx = constant. 
X| xX] xX] 


The clever idea that allows this additional twist to be taken into 
account is simply to write 


[ Vivo ax-L=0= | 


x2 


| VIO - : | ae, 
Ay — XL] 


| 


and then to argue that we can add zero (as many times as we wish) 
to J and we will have changed nothing! That is, with A anything (A 
is formally called a Lagrange multiplier) we will minimize not J but 
rather 


X2 L 
| osyv/i + (y)? +A {vi + (y’)? — (— = =) | dx. 


The integrand function to be inserted into the Euler-Lagrange 
equation is therefore 


AL 
F = pgyV14+ (y’)? +AV14+ 0’)? - 


X2— X] 


Notice, however, that the last term, if we take 4 as any constant, is 
itself a constant and thus it will immediately vanish upon taking 
any derivative. So, the actual F that we need to consider is simply 


F = pgyv1+(y')? +AV1 4 (y’)?. 


That is, F is the integrand of the integral we wish to minimize plus 
a yet undetermined constant multiple of the constraint integral’s 
integrand. 

Notice that this F does not have an explicit dependence on x, 
and so we can use the already partially integrated form of the Euler- 
Lagrange equation called Beltrami’s identity, just as we did in section 
6.5. That is, 
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OF 
—~F+y—=C, (a constant). 
dy’ 
Since 
OF | = ee It 1/2 
pyr = PBYD (1+ 07} dy +25 (L+OP PN 2y! 


pgyy ry’ 


= ——— + —, 
vi+Q’? V1+0"" 


then we have 


(y’)? Ay’)? 
— peyV1+ "2 —AVI+ OP + SE 1 =e 
Vitoy VItO% — 


This reduces, after just a bit of simple algebra, to 


PO 
(y')? = 8 a anal ea 
Ci 


We can now get two useful expressions from this. First, 


psy t+rA=CiV/1+4 (y’)?. 


And second, differentiation with respect to x of the (y’)* expression 
gives 


_ 2(pgy +4) psy’ 


2y'y" 
ne 4 C? 


which reduces (with the aid of the first expression) to 


» (egytaAypg  pgCivlt+y’)? — p 
Sa OT EE cs Oe 


Cc C. C, 


But this is precisely the same differential equation Bernoulli arrived 
at in the previous section (with Cip, replaced with k) by summing 
the horizontal and vertical forces on a section of the hanging chain. 
And we have already solved that (and this) equation using Riccati’s 
trick of defining p = y’. 
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Well, all this is wonderful, you say, but so far all we have done is 
use the calculus of variations to solve problems previously solved by 
other means! Is this all we are going to get from the Euler-Lagrange 
equation? The answer is no, and in the final sections of this chapter 
I’ll show you a couple of problems that the Euler-Lagrange formu- 
lation handles easily (including, at last, a proof of the isoperimetric 
theorem) that we have not found solutions to before. 

Before leaving the catenary, however, let me tell you about one 
last wonderful property it has, one that, while known for centuries, 
seems not to be well known. Imagine that you want to build an 
arch (e.g., the entrance to a church) out of a nonorganic material 
(i.e., something that won’t rot) that is very strong in compression 
but weak in tension. Such materials are brick, concrete, and stone, 
materials readily available to construction engineers for thousands 
of years. (Wood is very strong in both compression and tension, but 
it eventually decays.) The trick, then, to using bricks, concrete, and 
stone when building strong, durable structures is to avoid tension; 
in particular, we should construct our arch so that at every point 
there is only compression. 

Now, think of the catenary, the curve of a chain hanging in 
complete repose. It is, at every point, in tension only, i.e., there 
clearly is no point where a hanging chain is in compression. This was 
apparently first pointed out in 1675 by Newton’s contemporary (and 
sometimes rival) Robert Hooke (1635-1703). (After Hooke loudly 
claimed he was the true discoverer of the inverse square law of 
gravity, Newton deleted all mention of Hooke from the Principia. It 
didn’t pay to irritate Newton!) Further, Hooke went on to observe, if 
the hanging catenary was “frozen in place” (e.g., glue the links of the 
flexible chain together) and then inverted, the resulting arch would be 
in compression only, and at no point would there be tension. Thus, 
an inverted catenary is the best (strongest) curve for a stone arch. 

Hooke did not publish this result as a formal mathematical deduc- 
tion (he was, in fact, not a very good mathematician), and it wasn’t 
until considerably after Hooke’s time that that was done. This is il- 
lustrated in a letter (dated December 23, 1788) written by Thomas 
Jefferson, in reply to a letter he had received a few months earlier. In 
his letter, Jefferson’s correspondent had described his uncertainty in 
deciding between using a circular or a catenarian arch for the curve 
of the iron support tubes in the construction of a bridge. In his re- 
ply, Jefferson reports having just read a treatise on bridge arches, 
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written by the Italian mathematician Lorenzo Mascheroni (1750- 
1800) which showed “every part of the Catenary is in perfect equi- 
librium.” A modern example of an inverted catenary arch is the huge 
St. Louis Gateway Arch, made of stainless steel and standing 630 
feet high. You can find much more on the use of the catenary in 
construction in the beautiful little book by Jacques Heyman, The 
Stone Skeleton: Structural Engineering of Masonry Architecture (Cam- 
bridge University Press 1995). 


Here’s a calculus of variations problem, of engineering im- 
portance, for you to try your hand at, and to which I don’t 
think you can guess the answer by intuition. It occurs in the 
statistical theory of communication and information (and so 
electrical engineers are interested in it), but you don’t have to 
know anything about those fields to do the math. The pure 
mathematical problem is simply this: 


find the y(x) that maximizes J = — [°° y(x) In{y(x)}dx, sub- 
OO 

ject to the constraints y(x) = O unless O < x < M and 

(ee y(x) dx = 1, where M is a given positive constant. 


For those who are curious, y(x) is the probability density func- 
tion of some nonnegative random variable, the J integral is 
the entropy (a measure of information) of that random vari- 
able, and M is the maximum value of that random variable. 
The constraint integral is simply the obvious statement that 
the total probability that the random variable has a value some- 
where between minus infinity and plus infinity is one. But, as I 
said before, you don’t really need to know any of this to solve 
the problem. Can you see why the solution y(x) maximizes J 
(as opposed to minimizing it)? The solution is at the end of 
this chapter. 


6.8 The Isoperimetric Problem, Solved (at last!) 


The first complete proof of the ancient isoperimetric problem using 
the calculus of variations is due to the German mathematician Karl 
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Weierstrass (1815-97), dating from the period 1879-82. He never 
published that work, perhaps because of extremely poor health (af- 
ter 1860 he could lecture only while sitting down as a student wrote 
the mathematics on a blackboard), but a record of it nevertheless 
survived. His students kept detailed notes of his lectures, and in 1927 
his isoperimetric analyses were at last published. 

To cast the isoperimetric problem into the generic form required 
by the Euler-Lagrange equation will throw yet a new twist at us. 
Recall the formula given in section 6.2 for the area enclosed by a 
closed, non-self-intersecting curve C that is traced out by a clockwise 
moving point in the time interval from 0 to T: if the parametric 
equations of C are x = x(t) and y = y(t), then 


17 dx dy 
losed by C = = t)—- — x(t)—— > dt. 
area enclosed by | 19 a x( ae 


We want to find the C that maximizes this integral, given a fixed 
prescribed perimeter. That is, we want to maximize the area integral 
subject to a perimeter constraint, just as we had a length constraint 
in the previous section in the hanging chain problem. A differential 
length of the curve C is, of course, 


2 2 
ds = (dx)? + (dy)? = (=) + (2) dt, 


dt 


and so the total perimeter is (in terms of time) given by 


: dx\? dy 

fe-[VG)+G 

0 dt dt 
With the area and perimeter integrals written out explicitly like 
this we can see the new complication—both x and y are functions 
of a new, third variable, time, which we did not have in our earlier 
work. The easiest way to see how to handle this new feature is to 
simply back up and rederive the Euler-Lagrange equation, taking 
time into consideration (as you'll soon see, we will actually get two 
Euler-Lagrange equations). So, just as we did before, let’s assume 
x = x(t) and y = y(t) are the parametric equations for the solution 
curve C that we seek, and write perturbations around the solution as 
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X(t) = x(t) + €uj(t) 
Y(t) = y(t) + €u2(t). 


(41(t) and w2(t) are now two differentiable, independent, arbitrary 
functions, and € is some constant. Our problem is to find the ex- 
trema of the functional 


J(€) “fe F {t,e,X(t),X(), Y@, YO} de, 


ty 


where I am using Newton’s dot-notation for time derivatives (see 
section 4.4) to distinguish them from our earlier derivatives that 
were with respect to x, i.e., 
dx , d 
x(t) = — and Y(t) = — Y(t). 
dt dt 


So, 


X(x) = x(t) + ef (t) 
Y(t) = y(t) + eft(t). 


We can write the perimeter constraint integral as 


P(e) =| Glrexo. xe Y(t), ¥(t)} de, 


ty 


which is a given constant. So, doing as we did in the previous 
section, let’s form the sum integral J + AP (where A, the Lagrange 
multiplier, is for now any constant); finding the curve C that gives 
the extrema to J + AP will find the curve that gives the extrema of 
J while also satisfying the perimeter constraint. 

What we have, then, is 


J(€) +AP(é) ap (F +AG) dt, 


ty 


where F is the integrand of the original J integral and G is the inte- 
grand of the constraint integral. To keep the notation from becom- 
ing “busy,” let’s call F + AG = H. Then, 
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9) 
J(e) + 2P(@) = | H {t,e, X(t), X(), Y@), Y@)} de, 


ty 


and we know that if J(e€) + AP(e) has an extrema, it occurs at 
€ = 0 (because that’s how we constructed things!). Therefore, just as 
before, 


g {J(e) + AP(e)} | ae Lea 


dé E= ty O€ 
Writing 0H /de out in terms of the other variables, we have 


oH _ 0H OX | 3H aX . dH aY | aH aY 
de OX Oe OX de OY de BY DE 


_ 0H oO ee 4s 

a) uke) au) ee ca 
Because setting ¢ = 0 is equivalent to setting X(t) to x(t) and Y(t) 
to y(t), we have 


o- oH OH OH OH YC, 
“Jon OR Oye Ope 


Analogous to what we did in the original derivation of the Euler- 
Lagrange equation, let’s assume that both j;(t) and u(t) vanish 
at times ¢ = ¢; and t = f (the given start and stop times of our 
functional integral J). That is, 4) (t)) = W1(t2) = Ma(th) = wo(h) = O. 
Now, remember that we are free to separately choose «1; (t) and p(t) 
in any way we wish, as long as both vanish at t = t; andt = fh. In 
particular, we could choose j2(t) = 0 for all t. Then, 


2 (0H 0H 
— Mit = wif at =0. 
1, | Ox Ox 


Or we could choose pz; (t) = 0 for all t, and thus conclude that 


2 ( OH OH . 
— U2 + — flop dt = 0. 
1 oy dy 


Just as in our earlier derivation of the Euler-Lagrange equation, 
we now Simply integrate (by parts) the second term of each of these 
two integrals. Everything goes through just as before (I’ll leave the 
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filling in of all the easy details for you), but now we will end up with 
two Euler-Lagrange equations. I’ll enclose them in a box because of 
their central importance in the calculus of variations: 


0H 
Ox 


()= 


Now, at last, we can finally solve the isoperimetric problem. We 
have, from the start of this section, the integrand of the area func- 
tional as 


F => i -x)) 
and the integrand of the perimeter constraint as 
G = (#7 +57)". 
Thus, 
H = 50% — xy) $a(e2+y?)”. 


So, for our first Euler-Lagrange equation, we have 


Ln ee ee ae ee 
og + y*) 2x 
1 : 
= sy +Ax (#7 +57) ue 


and therefore 


Since 
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a OP ay Jez y2 fo 


OT 


df, ar _|_y 
dt y /x2 + y2 = 


which can be integrated by inspection to give 


Ax 
Ve SSS = C5 a constant. 
Vx? + y2 
Thus, at last, 
C Ax 
ba a rer 
Vx? + y? 


Repeating the calculations for the second Euler-Lagrange equa- 
tion, we have 
oH ] 
ar — Xx 
dy 2 2 


and so 
d (=) 7 l d hy 
dt \ dy Z dt | /x2 4 3 
Since 
oH _ | 
dy 2° 
then 
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OT 


d Ay 0 

—- xe oe: — : 

dt [x2 4 y2 
which immediately integrates to 

Ay 
x — ———— =C),  aconstant. 

Vx? + y? 

And so, at last, 
Ay 


VP +9 


Squaring and adding the (x — C2) and (y — C;) expressions gives 


) os Oe — 


1 Og S00 

(y¥- CP + - OQ) = a =) 
But this is just the equation for a circle (yes!) of radius 4 centered on 
the point (C2, C,). We can, of course, pick the integration constants 
C, and C> to center the circle anywhere we wish. We now see, too, 
that if P is the given perimeter of the closed curve of maximum area, 
then 27,4 = P, i.e., the Lagrange multiplier constant (of previously 
unknown value) is equal to P/27. Once again the Euler-Lagrange 
formulation has formally given us what we “knew” to be the answer 
to an historically important problem. But now we have a proof, and 
that is what a mathematician wants. Intuition, after all, is too often 
a passport to the “land of error”! 

Now that the mathematical certainty of the isoperimetric theo- 
rem has been established, let me end this section with a challenge 
problem for you. Unlike the other challenges in this book, this one 
does not come with an answer, because I don’t have one. And yet, 
it should require only first-year calculus. To start, consider figure 
6.17, which shows an ellipse (divided into four quarters) with semi- 
major axes of lengths a and b. The area of this ellipse is (a stan- 
dard freshman calculus problem) given by zab. In figure 6.18, the 
four quarters have been rearranged to form a new figure with area 
tab + (a — b)*. The crucial observation about these two figures is 
that they have the same perimeter (I'll call it P). 


258 CHAPTER 6 


FiGuRE 6.17. An ellipse. 


(a —b)? 


FIGURE 6.18. The ellipse of figure 6.17 quartered and rearranged (same peri- 
meter but increased area). 
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The parametric equations for the ellipse are 


x(t) = —acos(t) 


y(t) = bsin(t), 


equations that describe a point traveling in a clockwise sense along 
the boundary edge of the ellipse. The point makes one complete 
orbit of the perimeter in time interval 27 (see appendix F). From 
the formula for the length of a curve (see section 6.3), we then have 


Qn 20 ax ye dy : 
P= | Vv (dx)? + (dy)* = | (=) + (2 
0 0 dt dt 
20 
= | Va? sin?(t) + b2 cos2(t) dt. 
0 


Now, the isoperimetric theorem says that the area of a plane 
region with a perimeter of P = 27R cannot exceed the area of a 
circle with radius R. That is, 


A<zxR*=nx i iP 
— — NOn) An 


Thus, P > V4z7A, and so, using the area of figure 6.18 for A, we 


have 
20 
| a? sin*(t) + b? cos?(t) dt > ,/4a {mab + (a — b)?}, 
0 


where the equality obviously holds when a = b. 

Here’s the challenge—there seems to be no “easy” way to derive 
this inequality directly, by manipulating the integral on the left- 
hand side. That is, J can’t see how to do it. If you try your hand 
at it and succeed, please write to me and tell me how you did it! 


6.9 Minimal Area Surfaces, Plateau’s Problem, 
and Soap Bubbles 


In 1744, Euler solved the following purely mathematical problem, 
and thereby started an area of research that continues to this day: 
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FIGURE 6.19. Surface of revolution with circular ends. 


“If we connect the two given points (x;, y;) and (x2, y2) with a curve 
y = y(x) => O, and then revolve that curve about the x-axis, a 
‘cylinder-like’ surface will be created (with circular, open ends). What 
should that curve be to make the area of the surface as small as 
possible?” Euler actually considered the slightly less general case of 
y1 = yz (the open circular ends have the same radius), but the case of 
y, # y2 Offers no additional complications and so that’s the problem 
I’ll discuss in this section, with reference to figure 6.19. 

With ds as a differential length along the curve y = y(x), then 
the differential surface area dA swept out by revolving ds about the 
x-axis is 


dA = (2my)ds = 2my J (dx)? + (dy)* = 2ryV1+4+ (y’)? dx, 


and so the total area of the “cylinder-like” surface is 
x2 
J x] dA =2n | yV1+ (0)? dx. 


Thus, the F to be inserted into the Euler-Lagrange equation of sec- 
tion 6.4 is simply (ignoring the constant factor of 27) 
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F=y{l+'y}”. 


Since F has no explicit dependence on x, we can use Beltrami’s 
partially integrated form of the Euler-Lagrange equation, 


oF 
F — y’— =constant = C\. 
0 


; = 


Now 
oF I n2y-V/2 4 yy’ 
ae 1+ (y g dy ae ’ 
dy 2 /| ae (y’)? 


and so Beltrami’s identity becomes 


‘\2 
y [Ga 


Vv1+(y’)? 


OT 


y 


bia 
VI+0" | 


I’ve put this result in a box as I'll be referring back to it soon. 
Thus, if we square and multiply, we have 


y=Cr+C i)’, 
and differentiation with respect to y gives 
2yy’ = 2G yy”, 


or, using the boxed result, 


which becomes 


262 CHAPTER 6 


But this is just Bernoulli’s equation (with k& written instead of C;) for 
the catenary, derived in section 6.6! The surface formed by rotating 
a Catenary is called a catenoid, and so the answer to Euler’s surface 
problem is seen to be intimately related to Galileo’s problem of the 
hanging chain. Amazing! But this is only the start of the amazing 
results that flow from Euler’s pioneering calculation. 

A fascinating physical interpretation of Euler’s problem is found 
in the physics of soap films. Such films (or bubbles) are easily made 
from ordinary dishwashing detergent, warm water and, if desired, 
some glycerin to add stability to the films. Soap films have the 
property that their surface energy is proportional to their surface 
area, which means a minimum energy film (what Nature “strives” 
for) is equivalent to a minimum area film. [For a brief but quite 
interesting physics tutorial on this point, see A. Fomenko, “Minimal 
Surfaces” (Quantum, May/June 2000, pp. 4-7, 13), as well as the 
classic paper by two (husband and wife) mathematicians: Frederick J. 
Almgren, Jr., and Jean E. Taylor, “The Geometry of Soap Films and 
Soap Bubbles” (Scientific American, July 1976, pp. 82—93).] Therefore, 
to experimentally solve Euler’s problem all one need do is dip two 
circular wire rings into a soap solution and allow a film to form 
between them, as shown in figure 6.19. 

Fuler’s problem is, in fact, a special case of the so-called Plateau 
problem: given a contour in space, show that a surface of minimal 
area bounded by that contour exists. The name comes from the 
Belgium physicist Joseph Plateau (1801-83) who, over the period 
1843-69, experimentally studied minimal areas using wire contours 
dipped into soap solution (more on Plateau is in the final section 
of this chapter). The problem actually dates, however, from about 
1761, when it was posed by Lagrange. Lagrange’s formulation of the 
Plateau problem asks for the demonstration of a surface of min- 
imum area for any given single contour edge, i.e., for any given 
frame consisting of a single closed length of wire. Note carefully that 
the two unconnected circular rings of Euler’s problem are not such 
a contour, and that if the centers of the two rings are sufficiently 
displaced, laterally, then a soap film will not form. More precisely, 
there is no minimal surface for Euler’s two-ring contour if the pro- 
jections of the two rings do not have some overlap [see Johannes 
C.C. Nitsche, “Plateau’s Problems and Their Modern Ramifications” 
(American Mathematical Monthly, November 1974, pp. 945-68)]. 
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For a single, closed contour, however, no matter how bizarre its 
twists and turns in three-dimensional space may be, the answer to 
Lagrange’s original question is yes, there is always a minimal sur- 
face. That was first proven in 1931 by the American mathemati- 
cian Jesse Douglas (1897-1965) and, independently in 1933, by the 
Hungarian-born American mathematician Tibor Rado (1895-1965). 
What Douglas and Rado provided were existence proofs, which is, of 
course, not the same thing as actually displaying the specific mini- 
mal surface that goes with a given closed contour as its edge. Specific 
minimal surfaces for given edges are generally quite difficult to find; 
for example, in 1890, H. A. Schwarz (see the box at the end of section 
2.6) found the minimal surface determined by a skew quadrilateral 
contour, as shown in figure 6.20. Nitsche’s paper, cited above, gives 


FIGURE 6.20. Surface with a skew quadrilateral boundary. 
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the solution—its expression requires three hyperelliptic integrals, 
i.e., it is complicated! 

The general solution for Euler’s y = y(x) curve was shown in 
section 6.6 to be the hyperbolic cosine, i.e., with C; and C2 as 
adjustable constants (and now no length constraint), 


x—-—C) 
—C h 
‘ a ( C\ 


To keep things mathematically nice, let’s now drop back down to 
Euler’s original problem with y; = yp. In particular, let’s write y; = 
y2 = yo = 1, and position the y-axis so that xj = —xg < 0 and 
X2 = Xo > O. That is, the two soap rings are 2x9 apart. By symmetry 
we have y minimum at x = 0, and so C) = 0. Thus, 


X 
= h{— ]. 
y 1 COS (<) 


To be even more particular (So we can calculate some numbers), 
suppose x9 = 5. Then we have y = | at both ends (x = +3), and so 


2 
1 = C; cosh =) 
— osn {| —]. 
aC, 


This is a transcendental equation for C;, which means we can’t 
solve explicitly in closed form for C,;. We need to resort to numerical 
methods to find the value of C,; and this is, in fact, a perfect problem 
to which we can apply the Newton-Raphson algorithm developed 
in section 4.5. A plot of f(C;) = C),cosh(1/2C,) — 1 shows (see 
figure 6.21) that there are actually two values of C, that satisfy 
f(C;) = 0. This may at first seem puzzling as, after all, the minimal 
area surface would seem to be a unique surface. Do two solutions to 
f(C,) = 0 mean that a soap film can be either one of two possible 
shapes? The answer is no, and this will be explained by the end 
of this section. For now, however, the plot in figure 6.21 gives us 
initial guesses with which to start the Newton-Raphson algorithm, 
which fine-tunes the solutions to f(C,) = 0 when C; = 0.235 and 
C; = 0.848. 

If we separate the two rings even more, from xp = 4 to xo = 1, 
then we have yet another surprise waiting for us. With xo = 1, our 
condition on C,; becomes (at the rings where y = yo = 1) 
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0.25 


0.20 
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Figure 6.21. f(C,) = Cy cosh(1/2C}) — 1. 


l 
1 = C; cosh (=) 
Ci 


As figure 6.22 shows, there is now no real solution for Cj, i.e., the 
plot of f(C,;) = C,cosh(1/C,) — 1 never crosses the C; axis! We 
can understand what this means, mathematically, as follows. As xo 
increases from 5 to 1, the f(C)) curve “rises upward” and, at some 
critical value of x9 (call it x9), the two crossings of the C; axis merge 
together into a double root. For xp > xo the f(C,) curve rises above 
the C,; axis and there are no crossings (no real solutions). What is 
happening, physically, as we increase xo, is that at x9 = Xo the soap 
film breaks, and for x9 > Xo there is no cylindrical soap film surface 
connecting the two rings. 

So, what happens to the soap film after it breaks when xo exceeds 
xo? The answer (verified experimentally) is that it forms two circular 
films, one at each ring. Reducing xo to less than Xp does not, of course, 
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FIGURE 6.22. f (Cy) = C; cosh(1/C,) — 1. 


cause the catenoid surface to reappear, and so the breaking of the 
soap film is both sudden and irreversible. This discontinuous behav- 
ior is called the Goldschmidt solution, after the German mathemati- 
cian C.W.B. Goldschmidt (1807-51) who discovered it (on paper) 
in 1831. 

We can calculate the value of x9, the maximum value of xo that 
can support a catenoid minimum area soap film surface in Euler’s 
problem, as follows. We have, from before, that at the circular rings 
(where x = +xp and y = 1), 


| 
—— = cosh (2). 
Ci Ci 
We also know from our earlier discussion that a plot (for a given x) 
of f(C;) = cosh(x9/C,) — (1/C};) will, in general, have two solutions 
to f(C,;) = 0. When xp = Xo, however, the plot of f(C;) will just 


touch the C, axis at a single point. This means the C, axis is tangent 
to the f(C)) curve when xp = Xo, and so 
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df 
dC\ 


xXp=X0 


So, 


Or 


Sante HO 
Xo sinh (=) =. 
C; 


We also have, of course, that 


x 1 
cosh Zo — 
C| C! 


Dividing these two results into each other gives 


Xo 
h ae aw 
_ (2) 1 1 Xo 
es (2) C! Xo 
Xo sinh| — 
Cj 


OT 


This can be solved (numerically) to give x9/C; = 1.1997, and so 


| | 
xy = — ss = ——— = 0.6027. 


(2) sinh(1.1997) 
sinh{ — 
C\ 


In summary, if we have two wire rings each of unit radius, then there 
is a catenoid soap film if their separation is less than 2x9 = 1.3254, 
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and there can not be such a surface if their separation is greater than 
1.3254. 

Finally, to clean up the last loose end of this section, we need to 
explain why (for a given ring separation of 2x9) there are generally 
two possible values for C,;. To understand this, let’s calculate the 
actual surface area of the soap film catenoid as a function of C;. 
We have, by symmetry, that this area is simply twice the area of half 
of the catenoid surface, i.e., of the surface from one end to halfway 
to the other end: 


A = 2(27) [ yV 14 (y’)? dx, 
0 


where 


x 


y = Crcosn( 2 ) —Xgp < X < Xo. 


I 


But, looking back at the result (which I placed in a box at the 
beginning of this section) that we got from Beltrami’s identity, we 
have 


y 
1 2 — e 
v1+(’) C, 
Thus, 


Xo y? XO x 
A=4n | -—_ dx =4n ci | cosh? { — } dx. 
0 C 0 C; 


From any good table of integrals, we find that 


inh(2 
[ cost du = sl . 2 oie 7 


from which it immediately follows that 


For the case of xp = i, for example, we found earlier that C; = 0.235 


or C; = 0.848. Evaluating A for each value, we find A(C; = 0.235) = 
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6.85 and A(C, = 0.848) = 5.99. Thus, C; = 0.848 is the value to use 
to have the minimal area catenoid, the one that is actually observed 
to form. 

The discussion in this section on the Plateau problem of mini- 
mal area surfaces with a specified boundary edge has not even been 
a minimal scratch on the topmost surface of the topic (please for- 
give the outrageous pun!). Ever since Plateau’s pioneering soap film 
studies, there have been more questions than answers, and mini- 
mal surfaces will surely be an active area of mathematical research 
for many decades to come. Two of the most fundamental questions 
have, however, only recently been answered: (1) the reasons for the 
empirical Plateau rules for how soap films connect to each other, and 
(2) the wonderfully named double bubble conjecture. Each is easy to 
understand, but each required deep mathematical attacks for their 
solution. I’ll end this section with a paragraph on each. 

His extensive examination of countless soap films led Plateau to 
the conclusion that those structures do not assume their shapes at 
random. Rather, they follow two simple rules. Either 


1. three film surfaces connect along a common edge, with the 
surfaces making 120° angles with each other, or 

2. six film surfaces connect at a common point (making four 
edges together) with an angle of about 109° between any two 


of the edges (the exact value is cos! (-3) = 109.47122...°). 


Both of these rules are illustrated in figure 6.23, which shows the 
soap film that forms on a cubical wire frame (there are a total of 
13 films meeting along common edges and/or points). These rules 
appear to explain every soap film ever observed, but that’s hardly 
a proof that they actually do. There was always the possibility that 
a sufficiently complicated wire frame might result in a film struc- 
ture not explainable by Plateau’s rules alone. Only in 1976 (as a 
follow up to her 1972 Princeton doctoral dissertation) was it at 
last proven by the American mathematician Jean Taylor (1944—  ) 
that the rules follow as necessary and sufficient consequences of 
the surface-energy-minimizing property of soap films. You can find 
more on Plateau’s rules, and their implications, in the following two 
papers: Cyril Isenberg, “Problem-Solving with Soap Films” (Physics 
Teacher, January 1977, pp. 9-18), and Dale T. Hoffman, “Smart Bub- 
bles Can Do Calculus” (Mathematics Teacher, May 1979, pp. 377-88). 
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FIGURE 6.23. Plateau’s rules illustrated on a cubical frame. 


The double bubble conjecture says that if two prescribed but sepa- 
rate volumes are to be enclosed by the minimum surface area, then 
two bubbles made of three portions of spherical surfaces (one in 
common, of course) is the way to do it. Many have thought the 
double bubble conjecture to be obvious, e.g., the classic, popular 
1890 book on soap bubbles is by the brilliant English experimental- 
ist C. V. Boys (1855-1945)—Soap Bubbles and the Forces Which Mould 
Them—who wrote there of the spherical double bubble not as con- 
jecture but as obvious fact. Boys was wrong, however, and while the 
conjecture is true it is not at all obvious. As astonishing as it may 
seem, just the two-dimensional version (substitute areas for volumes, 
and perimeter for surface area, and then figure 6.24 shows a planar 
double bubble for two equal prescribed areas) remained unproven 
until 1993. The three-dimensional case, for two volumes, was even 
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FIGURE 6.24. Double (planar) bubble. 


tougher, resisting all efforts until 2000. You can read about these 
proofs (based in large part on the efforts of a team of undergradu- 
ate college mathematics students!) in the following two papers: Joel 
Foisy et al., “The Standard Double Soap Bubble in R* Uniquely Min- 
imizes Perimeter” (Pacific Journal of Mathematics, May 1993, pp. 47- 
58), and Frank Morgan, “Proof of the Double Bubble Conjecture” 
(American Mathematical Monthly, March 2001, pp. 193-205). 


6.10 The Human Side of Minimal Area Surfaces 


This last section to chapter 6 is a bit different from the rest of 
the book. It is all prose, with not a single equation. The reason is 
that, as I wrote the previous section on minimal surfaces and soap 
films, I came across some curious and (in one case) interlocking 
stories of the people (all mentioned in the last section) who did 
the pioneering mathematical and physical research. I wasn’t able 
to weave any of that material into the mathematical discussions, 
but instead have saved these little vignettes for a section of their 
own. I’ll start with C.W.B. Goldschmidt, the man who discovered 
the mathematics behind the breaking of the soap film solution to 
Euler’s minimal surface problem. 

Almost all books on the calculus of variations discuss the Gold- 
schmidt solution, but none (as far as I know) says anything about 
the man. My curiosity was sparked by the silence, and so I searched 
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for more information. That search eventually led to the discovery 
of a brief obituary notice that appeared in the 1851 volume of the 
American Journal of Science (pp. 443-44). There it was reported that 
Carl Wolfgang Benjamin Goldschmidt was a professor of astron- 
omy at the University of Gottingen (“though perhaps not a great 
astronomer he was an enthusiastic and laborious one”) and served 
as an assistant to the great Gauss at the observatory there. The notice 
made the observation, now ironic considering how history turned 
out, that “Goldschmidt’s name will be long honored by those who 
never knew him.” After mentioning “his investigation of the min- 
imum surfaces of rotation of curves about a fixed axis,” the no- 
tice ends by revealing (with a typically Victorian romantic view of 
death) the nature of Goldschmidt’s early demise: “His death was 
like his life—quiet and peaceful. He had long suffered from the con- 
sequences of an enlargement of the heart; and on the morning of 
Feb. 15th, he was found in his bed, sleeping the sleep that knows 
no waking.” 

Just two years before Goldschmidt’s insight into the Euler prob- 
lem, the soap film pioneer Joseph Plateau conducted a fateful ex- 
periment that would lead to his loss of sight. At the University of 
Liege, while conducting his doctoral dissertation research in physi- 
ological optics (in particular, the formation of images on the retina), 
Plateau stared at the sun for nearly half a minute. How he managed 
to do this incredibly stupid thing without being under the influence 
of brain-deadening drugs has always mystified me, but he did. He 
ended up paying a very big price for his diploma—after temporarily 
losing his vision and then partially regaining it, by 1841 his corneas 
were severely inflamed and by 1843 he was completely and irre- 
versibly blind. A very bad state for anyone, of course, and extraordi- 
narily bad for an experimentalist. Or so one would think. His classic 
soap film experiments were just beginning, and so Plateau enlisted 
the eyes and help of his colleagues and students (at the University of 
Ghent) to make the actual observations from which were deduced 
“Plateau’s rules.” The laws governing one of Nature’s most beautiful 
displays in the everyday world are, then, due to a blind man, an 
achievement that brings to mind the creation of beautiful music by 
the deaf Beethoven. 

In 1855, even as Plateau was pondering the soap films he could 
no longer see, the man who would “popularize” them was born. 
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Charles Vernon Boys eventually became famous as an inspired ex- 
perimental physicist, remembered to this day as the inventor of 
(among many inventions) the Boys camera. With that camera he was 
able to photograph rifle bullets in flight (at 1,400 miles per hour!), 
along with the acoustic shock waves they produced. You can find in- 
flight bullet images produced by Boys with his nineteenth-century 
camera, still fascinating to view in the twenty-first century, in Na- 
ture (March 2, pp. 415-21, and March 9, pp. 440-46, 1893). With his 
superfast camera you could even record the very bursting of a soap 
bubble by a bullet. 

During the Christmas season of 1889-90, Boys delivered a se- 
ties of lectures to a juvenile audience at the Royal Institution (Lon- 
don), and those lectures and the accompanying lantern slides were 
brought out as his famous book Soap Bubbles and the Forces Which 
Mould Them. The lectures (and book) were enormous successes, and 
both displayed wonderful teaching and expository skills. Interest- 
ingly, that wasn’t always the case for Boys. In his 1934 Experiment in 
Autobiography, H. G. Wells revealed that he had been a former stu- 
dent of Boys in 1886, at the Normal School of Science (London), and 
that he had been singularly unimpressed. As Wells wrote of Boys, he 
was “[T]hen an extremely blond and largely inaudible young man 
already famous for his manipulative skill and ingenuity with soap 
bubbles. ...In those days I thought him one of the worst teach- 
ers who has ever turned his back upon a resistive audience, messed 
about with the blackboard, galloped through an hour of talk and 
bolted back to the apparatus in his private room... . Boys was too 
fast.” By the time Wells wrote those words he was world famous as 
the author of The Time Machine, War of the Worlds, and other “sci- 
entific romances,” as he called his science fiction novels. Indeed, 
Wells was far more famous than was the still-living Boys. It would 
be interesting to know what Boys thought when he read Wells’ de- 
scription of him (and it is hard to imagine that it wasn’t brought to 
his attention by someone). 

Brilliant at experiment as he was, Boys had a dark side to him, 
too; he loved to play practical jokes on people, a sophomoric activity 
that mostly amuses the jokester. Certainly his wife, Marion, was not 
amused by her husband’s antics; she put up with them from the start 
of their marriage in 1892, but finally divorced him in 1910. There is 
some speculation, even today, that Boys’ treatment of his wife might 
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have strayed as far south as to be labeled abusive, but in any case the 
marriage was so wounded that even before 1910 Marion had begun 
an affair with the Cambridge mathematician Andrew Forsyth (1858- 
1942). The two married after her divorce, but the resulting scandal 
forced Forsyth to resign from Cambridge. As his obituary notice 
in Nature put it, “In painful circumstances he made a marriage of 
affection, and gained ten years of a happiness for which he counted 
the loss of many old associations a price not too high.” Forsyth later 
(1927), after Marion’s death, published the influential book Calculus 
of Variations (its dedication is simply “To Marion in Remembrance”) 
but in his discussion of Euler’s minimal surface problem there is no 
mention, none at all, of soap films, bubbles, or Boys. 

And finally, the authors of the Scientific American article on soap 
bubbles that is almost universally cited by authors writing on min- 
imal surfaces are, in a sense, a mirror image reflection of Marion 
and Andrew. Jean Taylor was Frederick Almgren’s (1933-97) first 
doctoral student at Princeton, and through him was introduced to 
minimal surfaces. (Her undergraduate degree, and a master’s too, 
are in chemistry, representing scientific knowledge that certainly 
must have given her physical insight into the behavior of soap films 
that a strictly pure mathematician would lack.) Taylor and Almgren 
later married and continued their mutual work on minimal sur- 
faces, work that eventually led to Taylor’s solution to the century- 
old problem of explaining Plateau’s rules. Jean Taylor (who I suspect 
was the inspiration for Rebecca Goldstein’s fictional Princeton math 
professor Phoebe Saunders, an expert in the mathematics of soap 
films—see Strange Attractors) is presently professor of mathematics 
at Rutgers University. 


Solution to the Problem in Section 6.7 


Writing A as the arbitrary, constant Lagrange multiplier that 
allows us to apply the constraints, we wish to minimize the 
integral 


| {—ylIn(y) + Ay} dx. 


OO 
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(continued) 
So, 
Fe =—yln(y)+Ay, 
and thus 
oF l 
— = —y——In(y) +A = —-1—In(y) +A. 
dy y 


Since F has no y’ dependence, then dF /dy’ = 0 and the Euler- 
Lagrange equation becomes 


—1—In(y) +4 =0, 
or 
In(y) =A-—1, 
which says 
y=e," a constant (because A is a constant). 


We can calculate the value of this constant from the integral 
constraint: 


M M 
| y(x) dx = 1 =I el dx =e! M. 
0 0 

Thus, 


yxy=e'=—, OK<x<M 
= (0). otherwise. 


Now, how do we know this y(x) gives a maximum J, and 
not a minimum? Because it is easy to demonstrate a different 
y(x) that gives a J smaller than the above solution y(x) gives, 
i.e., the solution y(x) does not minimize J. For the solution 
y(x), we have 
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(continued) 


: O<x<-—wM 


= 0, otherwise, 


which clearly satisfies the given constraints. For this y(x), we 


2M 9 2 2 2\1 2 
— In — dx = ——In —— —M = —In —— 
>» M \mM M \M/)2 M 


= In(M) — In(2) < In(M). 


have 


Historical update: The calculus of variations applications that I have 
discussed in this chapter are the ones of historical interest, but its 
use today has gone far beyond that of studying beads sliding down 
wires, and making fences of fixed length to enclose the maximum 
land. Today’s applications have mathematical structures far more 
complex than I have treated here, with differential equations and in- 
equalities serving as the constraint conditions. Such problems rou- 
tinely occur in what is called optimal control theory. To read much 
of the modern literature in that subject requires much more back- 
ground than I have assumed here, but interesting exceptions can be 
found. 

For example, consider the question of how a human runner 
should vary (i.e., control) her speed v(t) during a race of given dis- 
tance D in order to minimize her running time T. The runner starts 
from rest, of course, and so v(0) = O. This problem was beauti- 
fully analyzed by Joseph B. Keller in “Optimal Velocity in a Race” 
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(American Mathematical Monthly, May 1974, pp. 474-80). Keller be- 
gins by writing the mathematical statement 


T 
p= | v(t) dt, 
0 


along with Newton’s second law of motion (“F = ma”) as 


dv ov 
ae 
where v/T is the resistive force per unit mass of the runner (t, called 
a physiological constant, is characteristic of the particular runner) 
and f(t) is the propulsive force per unit mass of the runner. The 
runner controls f(t) (and, hence, v(t)) with the constraint that 
there is some maximum force, F, that she can exert (F is another 
physiological constant characteristic of the particular runner). 
Keller next writes E(t) as the oxygen available (per unit mass) 
to the runner’s muscles, and argues that oxygen is used at a rate 
proportional to the product fv (more oxygen is used the faster 
she runs and/or the harder she tries to run). On the other hand, 
oxygen is supplied at a rate proportional to yet another physiological 
constant, 0, which measures the efficiency of the lungs and of the 
blood circulation of the particular runner. That is, 


dE 
ae — Oo = fo. 
Finally, the runner is modeled as starting with an initial oxygen level 
Eo, where of course we demand that E(t) > 0 (think of what an 
E(t) < 0 would mean, physically, for the runner’). 

So, Keller’s problem reduces to the following mathematical ques- 
tion: 


given the positive constants t, F,o, Eo, and D, find v(t) and f(t) 
such that the T in 


T 
D= / v(t) dt, v(0) = O 
0 


is minimized, subject to the constraints that 
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1. (dv/dt) + (v/t) = f(t); 

2. f(t) < F (Keller doesn’t say f(t) > 0, but I consider that to 
be an obvious requirement); 

3. dE/dt =o — fv, where E(O) = Eo, E(t) > 0. 


For the (lengthy but nicely explained) details for how to solve 
this fascinating calculus of variations problem, and how to use the 
solution to “explain” some world records in long-distance track- 
and-field, see Keller’s paper. 


7. 


The Modern Age Begins 


7.1 The Fermat/Steiner Problem 


With the development of the calculus of variations well under way 
as mathematics entered the nineteenth, attention was redirected to 
an old problem in geometry. The problem is deceptively simple, but 
it proved to be a signpost to the future for extrema studies: given a 
triangle, as shown in figure 7.1, where is the point P inside that 
triangle that minimizes the sum of the distances from P to the 
three vertices? P is often called Steiner’s point, after the nineteenth- 
century Swiss mathematician Jacob Steiner, whose geometric work 
on the isoperimetric problem was discussed in section 2.3. In fact, 
however, the question about P greatly predates Steiner. Indeed, it 
was originally posed two centuries before Steiner, by Fermat, in his 
1629 Method for Determining Maxima and Minima and Tangents to 
Curved Lines. Torricelli in Italy (see the preface again) read this, and 
took up the challenge. 

We know that sometime before 1640 Torricelli was successful 
in locating P—and so it is occasionally called Torricelli’s point (of- 
ten called Fermat’s point, too)—because his student Vincenzo Vi- 
viani (1622-1703) published his late mentor’s geometric solution 
in his book De maximis et minimis (1659). It was their fellow Ital- 
ian, Bonaventura Francesco Cavalieri (1598-1647), however, who is 
given credit for publishing in 1647 the following interesting prop- 
erty of P, a property that reminds us immediately of one of Plateau’s 
rules for soap films: the lines connecting P to the three vertices 
meet at P at 120° angles, as long as all three of the vertex angles 
are each less than 120°. This result is often so very useful in modern 
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Ficure 7.1. Steiner’s problem. 


applications called facility location problems (e.g., where to locate the 
town fire department), that a derivation of the result is instructive. 
Curiously, a number of published analyses of P use the power of cal- 
culus to derive the 120° property, but fail to show where P actually 
is. This is odd because there is a beautiful but elementary geometric 
proof (different from Torricelli’s) that both deduces the 120° rule as 
well as shows how to locate P. What follows is based on that ele- 
gant proof, due to German historian of mathematics J. E. Hoffmann 
(1900-73), who published it in 1929. 

With reference to figure 7.1, rotate the triangle APB counterclock- 
wise around B by 60°, to arrive at C’P’B, as shown in figure 7.2. P 
rotates into P’ and, in particular, PA rotates into C’P’, AB rotates 
into BC’, and PB rotates into P’B. Thus, 


PA+ PB + PC =C'P’+ PB + PC. 


Since PB = P’B then the triangle PBP’ is isosceles, which means 
the base angles ZBP’P and ZBPP’ are equal. But since the third angle 
of the triangle PBP’ is 60° (by construction), then all three angles of 
the triangle PBP’ are equal (to 60°), and so triangle PBP’ is more 
than just isosceles—it is equilateral. (By the same sort of argument, 
so is triangle AC’B.) Thus PB = P'P. So, 


PA+ PB+PC=C'P'’+PP+PC. 
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FiGure 7.2. The 120° rule. 


The right-hand side of this equality is, in general, a broken path 
from C’ to C, which of course is shortest when it is straight. That 
would require ZBPC + ZBPP’ = 180°, or 


£BPC = 180° — ZBPP’ = 180° — 60° = 120°. 


Since we could equally well have drawn AC or AB as the horizontal 
side of the triangle in figure 7.1, we can conclude, too, that 


LAPC = LAPB = 120°. 


The beauty of Hofmann’s analysis is that, in addition to deducing 
the 120° property, it also shows us how to actually locate P. Here’s 
how. 

As stated before, the triangle AC’B is equilateral, and it is easily 
constructed. If we now construct the analogous equilateral triangle 
on either of the other two sides of the original triangle (as shown 
in figure 7.3)—remember, any one of the three sides of triangle 
ABC could be the one drawn horizontally—then not only will the 
straight line joining C’ to C pass through P but so will the straight 
line joining the outermost vertex of the second constructed equilat- 
eral triangle and the opposite vertex of the triangle ABC. Thus, the 
intersection point of those two lines is P! These two lines (as well as 
the third line connecting the third equilateral triangle’s outermost 
vertex to its opposite vertex of the triangle ABC) all three through P. 
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FiGurE 7.3. Locating P when P is inside the triangle. 


And finally, this elegant analysis also does something else for us— 
it tells us that, if one of the angles of the original triangle ABC is 
equal to or greater than 120°, then the point P is not inside the tri- 
angle. It is easy to see this because, as the vertex angle at B increases 
toward 120°, the point P moves toward B and, when the vertex an- 
gle reaches 120° the point P is B. But what if the angle at B increases 
beyond 120°? What happens then to P? It is not hard to show in 
that case that P remains at B. Here’s why. 

Just to be different from the previous discussion, which was based 
on Hofmann’s proof, let’s now assume it is the angle at vertex A 
that is greater than 120°, and that the point P is outside the triangle 
ABC (as shown in figure 7.4). By the first assumption, the angle £ 
between the side AB and the straight-line extension of the side AC 
is less than 60°. Now, rotate the triangle APB clockwise through the 
angle £6. Thus P rotates into P’ and B rotates into B’, where it is 
clear that B’ is on the straight-line extension of the side AC. Also, 
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Figure 7.4. When P is not inside the triangle. 


it is equally obvious that PB = P’B’ and that PA = P’A. Thus, the 
triangle PAP’ is an isosceles triangle with angle f at vertex A. Since 
B < 60°, then the equal base angles in that isosceles triangle are 
each greater than 60°, which says the base side PP’ < PA. Thus, 
the quantity we are trying to minimize, PA+ PB + PC, satisfies the 
inequality 


PA+ PB+PC > PP'+ P’B'+ PC, 


where the right-hand side is the length of the broken-line path 
B'P'PC, which in turn is at least as long as the straight-line path 
B’A + AC. That is, 


PA+ PB+PC> BA+ AC. 


Equality is achieved, i.e., the sum PA+ PB+ PC is minimized, when 
P = A, and we ate done. 

The Fermat/Steiner problem is important today because it (or 
generalizations of it) appear in many interesting problems of our 
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technological society. One such generalized version is called the fac- 
tory problem, for example. To make a direct connection to the Fer- 
mat/Steiner problem, suppose there are three factories labeled A, B, 
and C whose locations mark the vertices of a triangle. These fac- 
tories are to be supplied with monthly shipments of a crucial part 
from a soon-to-be-built central warehouse. If the cost of shipping 
a load of parts from the warehouse to any distant point is directly 
proportional to both the number of parts shipped and to the ship- 
ping distance, then where should the warehouse be located? If the 
warehouse is at P, and if factory A, factory B, and factory C need 
a, b, and c parts per month, respectively, then we obviously wish 
to locate P to minimize the quantity a(PA) + b(PB) + c(PC). If 
a = b = c then we have the original Fermat/Steiner problem, but 
if this condition does not hold then we have a more difficult prob- 
lem. That is, suppose we increase the number of factories from three 
to n, and write the parts required each month, by each factory, as 
C1,C2,°°*,Cn. Now we have to locate P by minimizing the quan- 
tity }°"_, c; (PX;), where PX; is the distance between P and factory 
X;. For a clever (but lengthy) outline of the general solution for 
the n = 3 case (cj, cz, and c3 are not necessarily equal), see Irwin 
Greenberg et al., “The Three Factory Problem” (Mathematics Maga- 
zine, March-April 1965, pp. 67-72). 

As a final (and amusing) example of the Fermat/Steiner problem, 
consider the following little tale. Some years ago (1960), while con- 
structing private telephone networks to connect multiple locations 
operated by a single business customer, the Bell Telephone Com- 
pany had to deal with a curious government regulation on how 
much Bell could charge for the use of a network. Rather than basing 
its charges on the actual usage of the network (calls per month), 
they were to be calculated in proportion to the length of the mini- 
mum length of wire needed to construct a network that could link 
all the distinct customer locations, even if that wasn’t the way the 
network was actually constructed. One of Bell’s customers was Delta 
Airlines, which had three airport sites (Atlanta, Chicago, and New 
York City) to be linked. Those sites just happen to form (approx- 
imately) the vertices of an equilateral triangle, as shown in figure 
7.5, and Bell concluded that its charges should thus be based on a 
path of wire with length 2 (measured in arbitrary units), shown in 
the solid line. 
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Site 2 1 Site 3 


FIGURE 7.5. Geometry of the Delta Airlines problem. 


Delta complained, arguing that would result in an overcharge. 
Somebody at Delta had remembered the Fermat/Steiner problem! 
What if, Delta asserted, it simply opened a fourth site, a ghost hub, 
at the Fermat point of the three-airport triangle? Then a network 
linking the three real sites and the ghost site could be built, as shown 
in the dashed lines of figure 7.5. That network, which we have just 
seen is of minimum length, has length /3 = 1.732 (Delta’s hypo- 
thetical path is called the Steiner span of the equilateral triangle). 
That is, Delta claimed (correctly) that the minimum-length net- 
work that could be built was 13.4% shorter than Bell’s proposed net- 
work, and that its billing charges should be correspondingly reduced 
(Delta was being overcharged by 15.5%). Bell Lab mathematicians, 
of course, knew a good argument when they saw it and agreed. 

If Delta had wanted to link four airport sites that had just hap- 
pened to lie on the vertices of a square, then we can see that the 
savings achieved by the Steiner span are less dramatic but still sig- 
nificant. Extending Bell’s original network idea to four points would 
give a path length of 3 (in arbitrary units), as shown in the solid line 
of figure 7.6. The Steiner span, however, in the dashed lines, uses 
two ghost hubs to achieve a length of 1 + /3 = 2.732, a reduction 
of 8.9%. 

Other problems, similar in spirit to the Fermat/Steiner problem, 
are treated in the paper by Bennett Eisenberg and Samir Khabbaz, 
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Site 1 Site 4 


Site 2 1 Site 3 


FIGURE 7.6. An extension of the Delta Airlines problem to two ghost hubs. 


“Optimal Locations” (The College Mathematics Journal, September 
1992, pp. 282-89). One that is particularly interesting concerns the 
optimal location of the transmitting antenna for a radio station 
serving several towns. If we make the reasonable assumption that 
the received signal strength decreases with increasing distance from 
the antenna, and further, that we demand the signal strength at the 
town most distant from the antenna be as strong as possible, then 
we have the following mathematical problem. If A represents the 
location of the antenna, and D; is the distance between A and town 
i, then we want to position A so that the maximum of the D; is 
minimized. 


7.2 Digging the Optimal Trench, Paving the 
Shortest Mail Route, and Least-Cost Paths 
through Directed Graphs 


For a path-length-minimization problem of an entirely different na- 
ture than that of the Steiner/Fermat problem, consider the follow- 
ing passage that I have taken from a 1986 paper (citation to follow 
soon): 
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A telephone company, while repairing buried cable, has discov- 
ered that although the cable is buried 1 m deep, often the cable is 
not directly under the marker that is supposed to be erected above 
it. They do know that the cable is always within 2 m of the marker 
in the horizontal plane. To ensure finding the cable, even when 
its direction is unknown, the repairmen dig a 1-m-deep trench in 
a Circle of radius 2 m about the marker. 


The geometry of this situation is shown in figure 7.7. 

In 1974 it was speculated that a trench with length shorter than 
the circumference of a circle (but still ensuring the discovery of the 
cable) could be dug as shown in the solid line of figure 7.7. It has 
length 27 + 4, and so the ratio of that length to the circumference 
of the circular trench is 


2n+4 a+2 
An 


= 0.81831, 


i.e., the shorter trench is more than 18% shorter. It wasn’t until 
1984, however, that this shorter trench geometry was proven to be 
the shortest possible trench that is a continuous arc. Amazingly, 


buried telephone cable a ae 


marker 


FIGURE 7.7. Geometry of the buried telephone cable problem. 
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if one allows the trench to be dug as several discontinuous (i.e., 
unconnected) parts, then this ratio can be further reduced to 0.7639. 
For the details elaborating on all of these statements, see the paper 
by V. Faber and J. Mycielski, “The Shortest Curve That Meets All the 
Lines That Meet a Convex Body” (American Mathematical Monthly, 
December 1986, pp. 796-801). 

Another minimal-length problem, in a different context, is that 
of finding a shortest closed path such as is illustrated in figure 7.8. 
The path is to start on the west side (W) of a quadrilateral (the solid 
line) at the given point p, and is to eventually return to p (the points 
A, B, C, and D, the vertices of the quadrilateral, are also specified). 
The path is constrained only by the requirement that it connects 
to each of the other three sides of the quadrilateral. Where the path 
actually touches the S, FE, and N sides is unspecified—only that the 
total path length be minimum. 


common green 


Ficure 7.8. Shortest interior path around a quadrilateral that visits each side. 
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One might imagine, for example, that the N, E, and S sides 
(BC, CD, and AD) of the quadrilateral represent the front property 
lines of three proposed new homes to be built facing onto a common 
green, and that p will be on the street entrance (AB) to the common 
green. The common green is, by covenant, to contain nothing but 
grass and a closed-path brick walkway allowing access to each of the 
three homes. Before laying out the walkway, the builder receives a 
request, from the post office, to make the walkway of minimum 
length, thus allowing the mail carrier to make his daily journey 
around the common green in minimum time. (Another example of 
the Postal Service ever striving for maximum efficiency!) The builder 
likes this request, too, since it minimizes the number of bricks he has 
to lay. 

There is a very clever solution to this problem, given in the paper 
by R. A. Jacobson and K. L. Yocom, “Shortest Paths within Polygons” 
(Mathematics Magazine, November-December 1966, pp. 290-93). If 
we Call the quadrilateral Q, and if we then reflect Q through side $ 
to get quadrilateral Q, (see figure 7.9), and then continue reflecting 


FiGureE 7.9. Finding the shortest path by reflection. 
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t 


FIGURE 7.10. The shortest path is a straight line. 


(i.e., Q; is reflected through E; to get Q2, and Q2 is reflected through 
N2 to get Q3), we then can follow point p through the reflections 
(marked as pj, p2, and p3). The solution is now obvious—the shortest 
path connecting p to p3 (a closed loop) is the straight path. Where 
this path crosses S(= S,), E;(= E2), andN2(= N3) determines the 
touching points ¢, u, and v, respectively (see figure 7.10). 

As another minimum-path-length problem of yet an entirely dif- 
ferent form, consider the directed graph of figure 7.11. That graph has 
n = 8 nodes (the circled numbers) that are connected with arcs that 
are always directed from left-to-right, i.e., from a lower-numbered 
node to a higher-numbered node. Associated with each arc is a non- 
negative number, called the cost of that arc, i.e., the cost of traveling 
from the lower-numbered node to the higher-numbered node. This 
cost may be the actual distance between the two nodes (in some ar- 
bitrary units), or perhaps it is a measure of the difficulty of traversing 
the arc (as measured by some means of the analyst’s choosing). We 
will impose only one constraint on a directed graph: the nodes must 
be numbered in such a way that if node i and node j are linked by 
an arc fromi to j, theni < j. This restriction prevents the existence 
of endless closed-loop subpaths within the directed graph. 
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FiGurE 7.11. A directed graph. 


The problem is to determine, among all of the possible ways to 
travel from node 1 at the far left to node 8 at the far right (more gen- 
erally from node 1 to node n), which path has the minimum total 
cost. The total cost of a path is defined to be the sum of the costs of 
the individual arcs that form the path. For example, the total cost 
of the path 1 — 3 > 7 > 8 is 9. But is that the minimum-total-cost 
path? The answer is no—can you see which path is the minimum- 
total-cost path? Even if you can, what if instead of n = 8, we had a 
directed graph with m = 100 nodes (or 10,000 nodes)? Such large di- 
rected graphs would clearly present enormous computational chal- 
lenges if you resorted to a brute-force enumeration and comparison. 
I’ll not solve this problem now, but instead let you think about it for 
a while. In section 7.5, as an illustration of dynamic programming, 
I’ll show you how to easily find the minimum-total-cost path, by 
hand, even for pretty large values of n. With the aid of a computer, di- 
rected graphs with very large values of n are just as easily processed. 

We can see why something better than a simple comparison of 
the total costs of all possible paths is required by calculating what 
computer scientists call the computational burden of enumeration. 
To make this calculation general, and not specific for any particular 
directed graph, let’s imagine that every node i links to every node 
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j > i (with the exception, of course, that node n is the end of the 
path). We’ll call such a directed graph a complete directed graph. (In 
section 7.5 I’ll show you how directed graphs occur, in a natural 
way, in a modern production control problem.) In any particular 
directed graph where a linking arc doesn’t actually exist between 
two nodes, we can effectively remove that arc by simply giving it 
an extremely large cost, which means the minimum-total-cost path 
will surely not include that particular arc. The number of arcs, N, in 
a complete directed graph is easy to calculate. Since node 1 connects 
to all of the remaining n — 1 nodes, and since node 2 connects to all 
of the remaining n — 2 nodes, etc., we have 


N=(n—-1)+(n—-2)+4+---+1, 


a sum well known to be n(n — 1). (Gauss is said to have done this 
calculation at age ten!) More subtle, however, is the calculation of 
the number of paths through those arcs, from node 1 to node n, in 
a complete directed graph. 

To calculate this, let’s define f(i) to be the number of paths from 
node i to node n (the answer to our question is f(1)). It is clear, to 
start, that since at node n — 1 we can go only to node n, then 


fn-D=1. 


What is f(n — 2)? Well, at node n — 2, we can go to just two places; 
directly to node n or to node n — 1. Thus, 


f(n—2)=14+ fa—)ND=14+1=2. 


What is f(n — 3)? Well, at node n — 3, we can go to just three places; 
directly to node n or to node n—1 or to node n — 2. Thus, 


fm@-3=14+fm—)ND+4+ fx—2)=14+142=4. 


One more time. What is f(n — 4)? Well, at node n — 4, we can go to 
just four places; directly to node n or to node n — 1 or to node n — 2 
or to node n — 3. Thus, 


fn—4 =14 fx—14+ fn—2)4 f(n—3) =1414244=8. 
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The pattern is clear: 
f(n—ky= 2%! 


The answer to our problem, f(1), means k = n—1. So, in a complete 
directed graph with n nodes, there are a total of 


] 
fGdjy=2"7 = 7 . 2" paths. 


The number of paths grows exponentially with the number of nodes, 
and so gets very big, very fast. For example, with just n = 35 nodes, 
there are a total of 


1 
ara 2° = 8,589,934,592 paths. 


Would you want to compute the total cost of each one of them to 
find the path with minimum total cost? I didn’t think so! 


7.3 The Traveling Salesman Problem 


A characteristic of a number of modern optimal-path problems is 
that the existence of a solution is theoretically obvious by inspec- 
tion, and yet they remain unsolvable in practice. This is in dramatic 
contrast to the historically important problems discussed in the pre- 
vious chapters. For those problems it was not at all obvious, by any 
means, what the solutions might be or even if there was a solution. 
The most famous of the modern optimal-path problems is the so- 
called “traveling salesman problem,” which gets its name from the 
amusing context in which the problem is usually presented. Imag- 
ine that a salesman, who lives in City O, periodically drives to n 
other cities to visit clients. He travels to one city after another, vis- 
iting each city exactly once, and then after seeing the nth client 
returns home to City 0. He knows the distance between each pair 
of cities, and from this knowledge he wishes to determine the par- 
ticular sequence of city visits that minimizes the total, closed-loop 
travel distance. 

If we write d(i, j) as the distance to travel from City 7 to City 
j, then determining the total travel distance for a given ordering 
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of cities is a trivial exercise in addition. (By the way, notice that in 
general d(i, j) # d(j,i), because the roads connecting two cities 
may be one-way roads of different length. Indeed, if two cities are 
linked in just one direction we could have the case d(i, j) < oo and 
d(j,i) = ©.) 

The “solution” to the traveling salesman problem is now obvious 
—simply look at all permutations of the integers 1 to n that, ad- 
ditionally, start and end with O (each such string of numbers is a 
possible closed-loop path that represents a legitimate travel route), 
compute the total travel distance for each string, and select the 
string with the smallest total. This approach is certain to find the 
solution, but the problem with it is that the number of permuta- 
tions grows at an enormous rate with n. This is because, starting at 
City 0, the salesman has n choices for the city to visit first, then n—-1 
choices for the second city to visit, n — 2 choices for the third city 
to visit, etc. Finally, after visiting the last city on his list (after his 
nth choice), he returns home to City 0. So, there are a total of n! 
closed-loop candidate paths for a tour of n + 1 cities (City 0, plus 
Cities 1 through n). 

For n = 6 (7 cities) there are just 6! = 720 closed-loop paths to 
compare. But for n = 70 (71 cities) there are 70! routes to compare, 
i.e., almost 1.2 x 10! routes. Increasing the tour by a factor of 
10 (7 cities to 71 cities) has resulted in a supernova explosion of 
the number of tours that must be compared. n! grows far more 
rapidly than exponential growth, and for n = 70, the solution, by 
brute-force enumeration, has become computationally beyond any 
computer we can imagine being built using today’s technology of 
clocked, sequential logic. So, while the traveling salesman problem 
clearly has a solution for any value of n, once n exceeds just a modest 
value, nobody can actually determine, by enumeration, what that 
solution actually is! 

A dramatic illustration of just how absurd the brute-force enu- 
meration “solution” is was given some years ago by George Dantzig 
(1914- ), the American mathematician who in 1947 developed 
the astonishingly effective simplex algorithm for solving linear pro- 
gramming problems (the topic of the next section). In his paper 
“Reminiscences about the Origins of Linear Programming” (Oper- 
ations Research Letters, April 1982, pp. 43-48), Dantzig wrote 
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Now 70! is a big number, greater than 10!°°. Suppose we had an 
IBM 370-168 [a very big computer in the 1980s] available at the 
time of the Big Bang 15 billion years ago. Would it have been able 
to look at all the 70! combinations by the year 1981? No! Suppose 
instead it could examine 1 billion assignments per second? The 
answer is still no. Even if the Earth were filled with such computers 
all working in parallel, the answer would still be no. If, however, 
there were 10°° earths or 10“ suns all filled with nanosecond speed 
computers all programmed in parallel from the time of the Big 
Bang until the Sun grows cold, then perhaps the answer is yes. 


A large number of important problems that commonly occur 
in modern society have this same property of enormous computa- 
tional complexity if attacked head-on by brute-force enumeration. 
For example, in the above illustration, Dantzig was writing not of 
the traveling salesman problem but rather of the so-called “assign- 
ment problem”: how to assign 70 men to 70 jobs (with each man 
providing different skill levels for each job) in such a way as to get 
all 70 jobs done in minimum time. Because of the “n! problem” 
much effort has gone into discovering computationally efficient algo- 
rithms. Two general approaches that have been developed are the 
one | just mentioned, called linear programming (Dantzig’s simplex 
algorithm could, in 1981, solve the 70! assignment problem in less 
than a second), and the very different method of dynamic program- 
ming. To finish this book I’ll briefly discuss each in the next two 
sections. 


7.4 Minimizing with Inequalities (Linear Programming) 


In this section you’l] encounter the fundamental ideas of linear pro- 
gramming, a topic on which literally hundreds of books have been 
written over the last fifty years, from very elementary treatments 
to ones using mathematical techniques far beyond the level of this 
book. The presentation, here, in a single section, will necessarily be 
at the most basic level, along with some historical commentary. But 
let me clear away one common misunderstanding, immediately, be- 
fore I begin. Linear programming is a mathematical technique that 
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can be (usually is) implemented as a computer program, but that is 
not what the word “programming” means. Rather, its original his- 
torical usage was as the name for the administrative task of schedul- 
ing a sequence of time-ordered events (usually with the objective 
of optimizing some measure, e.g., minimizing the total cost, or the 
total time required). Indeed, the naming of the task of writing code 
for a computer as programming derives from that historical origin, as 
after all that is what writing a computer program is—the schedul- 
ing of a time-ordered sequence of events, with each event being the 
execution of an individual instruction in the program code. Pro- 
gramming (i.e., scheduling) problems, however, were studied long 
before the first programmable electronic computers were built. 

The formal definition of the mathematical linear programming 
problem is quite simple, in principle: 


given a linear function f (called the objective function) of n inde- 
pendent, nonnegative real variables x), x2,---,x,, along with a 
system of inequalities linear in the x;, find the specific values of 
the x; that minimize (maximize) f. 


A vast number of important optimization problems in modern so- 
ciety have this structure. The mathematical study of systems of 
inequalities can be traced as far back as 1826, to the French mathe- 
matician Joseph Fourier (1768-1830). That year he published a short 
paper in which he considered a problem involving multiple inequal- 
ities that, together, define what he called an “irregular polygon” 
and what is today called a convex region. He elaborated on these 
ideas in a second paper published the following year. Fourier’s work 
was a fundamental foreshadowing of concepts basic to modern lin- 
ear programming. You can find more on what he did in two pa- 
pers by I. Grattan-Guiness, “Joseph Fourier’s Anticipation of Linear 
Programming” (Operational Research Quarterly, 1970, pp. 361-64), 
and “On the Prehistory of Linear and Non-Linear Programming,” 
in The History of Modern Mathematics, vol. 3 (Academic Press 1994, 
pp. 43-89), and in the paper by H. P. Williams, “Fourier’s Method of 
Linear Programming and Its Dual” (American Mathematical Monthly, 
November 1986, pp. 681-95). 

The concepts of linear programming optimization are no longer 
limited to just scholarly journals, but have actually penetrated into 
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popular fiction as well. For example, in Robert K. Tanenbaum’s 1987 
novel No Lesser Plea, we find the following little speech from a lawyer 
in the San Francisco District Attorney’s Office: 


To gain maximum efficiency, we have to view the entire criminal 
justice system as a whole, and adjust the inputs of resources at 
each node so as to optimize throughput. . . . so we have developed 
a Trial Screening Profile that assigns priorities to different sorts 
of cases and generates scores. Then we can observe the trial dis- 
positions of various [assistant district attorneys] and bureaus and 
see whose scores diverge from the optimum and take corrective 
action. 


Later in the same novel we get a skeptical response from another 
lawyer who heard the above: 


Look, they’re trying to control the whole office with numbers. But 
you can’t really control anything with numbers unless you have 
a sense of what the numbers mean. Which they don’t... . . It’s like 
that story about the Russian chandelier factory. They get a quota 
from Moscow each year—make six tons of chandeliers. So they 
make one six-ton chandelier and take the rest of the year off. 


As the first example of a linear programming problem, the famous 
“diet problem” formulated by the American economist George Stig- 
ler (1911-91) is, I think, the best choice. The diet problem is a main- 
stay in today’s textbook discussions of linear programming, both 
because it is obviously important and easy to understand. It is not 
generally easy to solve, however, at least not by hand. Imagine that 
we have two lists in front of us. One gives a number of different 
foods, their nutritional content (vitamins, minerals, fiber, fat, calo- 
ries, etc.) per unit amount, as well as the cost of each food per unit 
amount. The other list contains the minimum and/or maximum 
amounts of nutritional intake, per unit time, required by an adult 
to maintain good health. The solution to the diet problem is the 
determination of the amount of each food required, per unit time, 
to satisfy the nutritional needs at minimum cost. 

In a paper published in 1945 (“The Cost of Subsistence,” Journal 
of Farm Economics, pp. 303-14), Stigler attempted to solve the diet 
problem using actual data for Americans. For a total of 77 available 
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foods in August 1939, along with nine nutritional constraints, he 
arrived at a diet with a yearly cost of $39.93. It was a pretty awful 
diet (wheat flour, evaporated milk, cabbage, spinach and beans!), of 
which Stigler said “No one recommends [this diet] for anyone, let 
alone everyone.” It does sound a little like the gruel given by Dickens 
to Oliver Twist but, still, if not particularly tasty, it is a low-cost, 
nutritionally sound diet. But was it the minimum-cost nutritionally 
sound diet? Stigler was careful to not make that claim because, as he 
wrote, his “[analysis] procedure is experimental because there does 
not appear to be any direct method of finding the minimum of a 
linear function subject to linear conditions.” 

Just two years later, however, Dantzig published just such a 
method, his now famous simplex algorithm. Indeed, in the fall of 
1947, the Mathematical Tables Project (MTP) at the National Bu- 
reau of Standards used the simplex algorithm to solve Stigler’s array 
of nine inequalities in 77 nonnegative variables. [The MTP was a 
Depression-era Work Projects Administration effort that employed 
hundreds of out-of-work office clerks. See David Alan Grier, “The 
Math Tables Project of the Work Projects Administration: the Re- 
luctant Start of the Computing Era” (IEEE Annals of the History of 
Computing, no. 3, 1998, pp. 33-50)]. Using just desk calculators, it 
required almost 17,000 multiplications and divisions spread over 
more than 100 man-days of work to do the job; the least expensive, 
nutritionally sound diet was determined to cost $39.69 per year, us- 
ing not Stigler’s five foods but instead nine (wheat flour, corn meal, 
evaporated milk, peanut butter, lard, beef liver, cabbage, potatoes 
and spinach—still pretty awful!). 

Stigler’s accomplishment of getting to within 0.6% of the correct 
solution to such a complex problem was certainly impressive, in- 
deed amazing, but his trial-and-error approach would have no hope 
of producing similar success in the face of a problem with tens of 
thousands of variables and constraints. And such monster problems 
are the typical linear programming problem today, occurring in such 
applications as scheduling airline flights and routing telephone calls 
(both of which attempt to maximize a flow through a global net- 
work with varying local limits on congestion). Dantzig’s simplex 
algorithm, on the other hand, handles such problems with ease. 
Programmed on a modern home computer, for example, Stigler’s 
original diet problem is solved in the blink of an eye. 
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The simplex algorithm requires linear algebra to properly explain 
it, but we can appreciate what it does with the following simplified 
diet problem. Imagine that, to be healthy and grow into a fine, fat 
chicken dinner, a chicken needs to consume a minimum weekly 
amount of three different nutrients each (called A, B, and C). Let’s 
further assume, to be specific, that the minimal amounts are (in 
some unit system) 60, 84, and 72, respectively. The local feed store 
stocks two brands of chicken feed, markedly different from each 
other. One is cheap because it’s low on nutrients per ounce, and 
the other is expensive because it’s high on nutrients per ounce. To 
be specific, suppose the details are 


A B C Cost 
(nutrients per ounce) (pennies per ounce) 
Feed #1 3 7 3 10 
Feed #2 2 2 6 4 


From this we can see that if the farmer buys just feed #1, then, 
to provide a chicken with its minimum weekly nutrients, he must 
buy the maximum of {60/3, 84/7, 72/3} ounces = the maximum of 
{20, 12, 24} ounces = 24 ounces, at a cost of $2.40 per week. On 
the other hand, if the farmer buys just feed #2, then, to provide 
a chicken with its minimum weekly nutrients, he must buy the 
maximum of (60/2, 84/2, 72/6) ounces = the maximum of (30, 42, 
12) ounces = 42 ounces, at a cost of $1.68 per week. Obviously, if he 
is going to buy just one of the feeds then feed #2 is the better (i.e., 
cheaper) pure strategy. But is that the best possible choice? That is, 
could he feed a chicken a nutritionally adequate diet for less than 
$1.68 per week if he used a mixed strategy, i.e., if he used a mix of 
the two feeds? We can answer this question by stating the problem 
as one in linear programming. 

If we denote the weekly amount (in ounces) of feed #1 by x; and 
of feed #2 by x2, then we can express the nutritional constraints with 
the following three inequalities: 


(a) 3x; +2x. > 60 (nutrient A) 


(b) 7x; +2x2 > 84 (nutrient B) 
(c) 3x; +6x. >72 (nutrient C). 
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The objective function to be minimized is the weekly cost: 
F (1, X2) = 10x; + 4x2. 


The constraint inequalities can be written as 


1 > 30-5 x 
er) ee 
i 2 
x2 > Ooo a 
i 2 


The geometric meaning of each inequality is easy to understand: if, 
for example, we plot the line x2 = 30 — 3x) (the equality version of 
the first inequality), then the inequality is satisfied by any point on 
the line or above the line. The three inequalities are simultaneously 


i 


B constraint 


"ai 


A constraint 


C constraint ——— 


0 2 10 15 20 
xy 


FIGURE 7.12. Chicken diet feasible solution set (shaded). 
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satisfied by any point that is (the shaded region shown in figure 
7.12) above all three lines (or on the boundary edge of that region). 
All of those points together form the so-called feasible solution set. For 
this problem we see that the feasible solution set is an unbounded 
convex region in the first quadrant of the x;, x2 plane. We also see 
that the two pure strategies considered earlier are represented by the 
point (0, 42), which is the pure strategy of using only feed #2, and 
the point (24, 0), which is the pure strategy of using only feed #1. In 
addition, there are two other vertex points on the boundary of the 
convex feasible solution set. One, the point (6, 21), is determined 
by the intersection of the A and B lines and the other point, (18, 3), 
is determined by the intersection of the A and C lines. 


The feasible solution set is often called a polytope or simplex 
(hence the algorithm’s name), but this is a loose use of tech- 
nical language. A simplex in n-dimensional space is the most 
elementary (minimum complexity) structure that exists in the 
complete space, e.g., a triangle in two-dimensional space is a 
simplex, while a line is not because it uses only one of the two 
dimensions available. An n-space simplex has n + 1 noncol- 
linear vertices (e.g., a triangle in 2-space has three vertices not 
all on the same line). A feasible solution set, however, as I’ll 
soon demonstrate, can have lots more than n + 1 vertices! The 
concept of the simplex greatly predates linear programming; 
it was introduced a century earlier as the prime confine by the 
great English mathematician William Kingdon Clifford (1845- 
79). Citing the two- and three-dimensional versions as the tri- 
angle and the tetrahedron for the “simplest form of confine” 
of an area and a volume, respectively, Clifford generalized the 
idea to n-dimensions, noting that the prime confine in n-space 
has n + 1 vertices. 


Let’s now plot the objective function on top of the feasible so- 
lution set, as shown in figure 7.13, using several different constant 
values for f. We see that the result is a family of parallel, straight 
lines with the general equation 
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FiGuRE 7.13. Chicken diet feasible solution set with objective function. 
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The lines are parallel, of course, since each has the same slope of 
—3. To solve our problem geometrically, then, we simply look for 
that objective function line with the smallest value of f that still 
passes through the feasible solution set. Since the feasible solution 
set is convex, then the minimum f line will be the line that is as far 
to the left as possible, i.e., the line that passes through the feasible 
solution set vertex at (x; = 6,x2 = 21). Thus, the minimum-cost 
weekly diet for a chicken consists of 6 ounces of feed #1 mixed with 
21 ounces of feed #2, at a weekly cost of (6 x 10¢)+(21 x 4¢) = $1.44. 
This is more than 14% cheaper than the cheaper of the two pure- 
strategy diets. 

What happened, geometrically, in this problem is that each con- 
straint inequality divided a two-dimensional space in half with a 
one-dimensional line. All of those divisions together carved out a 
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convex region (unbounded, in this case) of the two-dimensional 
space, which we call the feasible solution set. Any point in that 
set satisfies all of the constraints. The minimum of the objective 
function was located at one of the extreme points of that set, i.e., 
at a vertex of the feasible solution set. The same thing happens 
as we encounter problems with more than two variables. Thus, 
in an n-variable problem, each constraint inequality divides an n- 
dimensional space in half with an (m — 1)-dimensional surface. All 
of these divisions together carve out a convex region of the n- 
dimensional space, which is the feasible solution set. The extreme 
of the objective function is located at one of the extreme, outermost 
points of that set, i.e., at a vertex of the feasible solution set. 

For a two-variable problem like the chicken diet problem, it is 
easy to literally watch all of this taking place on a flat piece of paper. 
In n-dimensional space it is not so easy to “see”! And while we do 
not have to consider all of the points in the feasible solution set, but 
only those points on the surface that are the vertices of the set, there 
are nevertheless a lot of vertices as n increases into the thousands. In 
n-dimensional space, the convex feasible solution set resembles the 
faceted face of an n-dimensional diamond; the simplex algorithm 
moves over the face from vertex to vertex with the goal of increas- 
ing/decreasing the objective function at each move. When that can 
no longer be done, the optimum vertex has been found. How the 
movement from vertex to vertex is controlled is the simplex algo- 
rithm, and, in general, is explainable only in the mathematics of lin- 
ear algebra and matrix theory; I refer you to any good book on linear 
programming—see, for example, the last paragraph of this section. 

We can calculate an upper bound on the number of vertices for 
the feasible solution set as follows: if there are n variables (each 
defining a nonnegativity constraint of the form x; => 0), and m 
additional constraints, then there may be as many as () = (m+ 
n)!/m! n! vertices. This claim follows from the simple idea of taking 
any n of the total of m+n constraints and solving them as equalities, 
thus defining the values of x; for a possible vertex. The reason I say 
that this is an upper bound is because we may find that, for certain 
selections of the n constraints from the m + n total constraints, a 
true vertex is not defined. For example, in the chicken diet problem 
we had n = 2 variables and m = 5 constraints (the x; > 0 and 
x2 > 0 ones, plus the three nutritional constraints). For a problem 
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of this size, the upper bound on the number of feasible solution set 
vertices is 


7! 


—— = 21, 
5! 2! 


but in fact we found in the chicken diet problem that there are 
just four vertices (look back at figure 7.12). A potential vertex that 
didn’t make the final cut is, for example, the solution of the equation 
versions of the nutritional constraint inequalities for B and C. Their 
solution point is clearly not a vertex of the feasible solution set. But 
even if only a very small fraction of the upper bound is realized, 
the number of vertices can be enormous. For example, in Stigler’s 
original diet problem with n = 77 variables and m = 9 nutritional 
constraints, the upper bound is nearly half-a-trillion, i.e., 


86 86! 
— ——_ — 46x 10!!. 
9 77! 9! 


For a second example of what appears to be linear programming 
(but actually isn’t), imagine a large industrial production facility 
with (initially) 1,000 identical operational machines, each of which 
can make either of two parts (I’ll call the two parts A and B). The 
manager of the facility can assign each operational machine, each 
week, to the task of making either part A or part B. The manager’s 
goal is to maximize the total profit generated by his facility over the 
next four-week period. Part B generates more profit than does Part 
A, but mechanically stresses a machine more than does Part A. He 
has to decide how to assign his operational machines, at the start of 
each new week, with the following constraints: 


1. if a machine makes part A for a week, then that machine will 
generate a profit for that week of $400. 

2. if amachine makes part B for a week, then that machine will 
generate a profit for that week of $600. 

3. of all the operational machines assigned each week to make 
part A, 20% will suffer some mechanical breakdown and be 
unavailable for future assignment. 

4. of all the operational machines assigned each week to make 
part B, 40% will suffer some mechanical breakdown and be 
unavailable for future assignment. 
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To formulate the manager’s problem in mathematical terms, let’s 
write 


x; = number of operational machines assigned to make 
part A during week i, f= 1,2,-3,4 


y; = number of operational machines assigned to make 
part B during week i, ie— a We ie 


and the objective function (the total, four-week profit) as 


f = 400 (x; + x2 + x3 + x4) + 600 (y; + yo2 + y3 + ya). 


The manager’s problem is to determine the integers x), x2, x3, X4, 
y1, 2, ¥3, and y4 that maximize f, subject to the following con- 
straints: 


x; + y; = 1000 

Xo + yp = 0.8 x; + 0.6 yj 
x3 + y3 = 0.8 x2 + 0.6 yo 
x4 + y4 = 0.8 x3 +0.6 y3 
x; > 0, ba 275.4 
yi = 90, p= 1. 2, 3;A. 


The requirement that the x;, y; be integers is, of course, the result 
of the obvious condition that operational machines come only in 
integers. This is a requirement that was not present in the diet 
problem, and it dramatically alters the mathematical structure of 
the problem. One can attempt to use linear programming to find 
the x;, y;, and if the result happens to give them as integers, then 
all is well. But there is no guarantee that will happen. Indeed, when 
it does happen, it is simply a lucky accident of the numbers. I'll 
solve this problem in the next section, with the entirely different 
method of dynamic programming, which will automatically give 
us the integer solution. 

One might naively hope that, if linear programming arrives at a 
noninteger solution, then perhaps simply rounding that solution to 
integers will solve the problem. That, unfortunately, is not generally 
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true, as shown by a simple counterexample. Suppose we wish to 
maximize the objective function f = 40x; + 70x2, subject to the 
constraints 3x; +5x2 < 1S and x; +5x> < 10, as well as that x; and 
x2 must be nonnegative integers. If we initially ignore the integer 
constraint, then we will arrive at the shaded convex region of figure 
7.14 as the feasible solution set (for now, ignore the circled points). 
Plotting a family of parallel objective function lines shows that the 
one giving the maximum / that still intersects the feasible solution 
set is the one that passes through the vertex at (2.5, 1.5), i.e., x; = 2.5 
and x» = 1.5. Thus, 


frnax = 40(2.5) + 70(1.5) = 100 + 105 = 205. 


Rounding the linear programming solution to the integer-coordin- 
ate points in the feasible solution set (to the points (3,1) and (2,1)) 
might seem like the next thing to do, but neither of these is the 
solution to the so-called integer-programming problem. Here’s why. 

For the problem at hand, we can find the integer solution by 
enumeration, i.e., by simply testing all points in the feasible solution 
set with integer coordinates. There are 11 such points, the points 
circled in figure 7.14. For this small-scale two-dimensional problem, 
enumeration is only a bit tedious (in problems with more variables, 
you can see matters would get very tedious, very quickly): 


Integer Coordinates f =40x, + 70x 
(O, O) 0 
(O, 1) 70 
(O, 2) 140 
(1, 0) 40 
(1, 1) 110 
(2,0) 80 
(2,1) 150 
(3,0) 120 
(3,1) 190 
(4,0) 160 
(5,0) 200 


Thus, the solution to the integer programming problem is x; = 
5, x2 = 0, which gives fmax = 200. This solution is not even close to 


THE MODERN AGE BEGINS 307 


FiGurE 7.14. Integer programming. 


the solution to the linear programming problem (x; = 2.5, x2 = 1.5), 
or to its possible “rounded” values, and so rounding is discredited. 

It wasn’t until 1958 that the American mathematician Ralph 
Gomory (1929-_), then an assistant professor of mathematics at 
Princeton, published an algorithm for solving the integer program- 
ming problem. You can find Gomory’s breakthrough idea (the so- 
called method of cuts) discussed in the tutorial paper by Joe Wampler 
and Steve Newman, “Integer Programming” (The College Mathemat- 
ics Journal, March 1996, pp. 95-100). In 1991 Gomory wrote a fasci- 
nating historical essay on how he came to integer programming, and 
you can find that paper (“Early Integer Programming”) reprinted in 
the January-February 2002 issue of Operations Research, (pp. 78-81). 
His original motivation came from a chance remark he heard dur- 
ing a lecture on the Navy’s use of linear programming to study its 
ship assignments within the fleet: getting answers like “1.3 aircraft 
carriers” were of little value for planners! 
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Linear Programming and the Nobel Prize in Economics 


The economics Nobel prize is not one of the original prizes, 
but rather is formally “The Bank of Sweden Prize in Economic 
Sciences in Memory of Alfred Nobel,” first awarded in 1969. 
In 1975 the economics prize was shared by the Soviet math- 
ematician Leonid Kantorovich (1912-86) and the Dutch-born 
American economist Tjalling Koopmans (1910-86). The cita- 
tion on their award was “for their contributions to the theory 
of optimum allocation of resources,” i.e., for linear program- 
ming. Those two men certainly were of the caliber one would 
expect for a Nobel laureate, but what of George Dantzig, who 
is the recognized inventor of the simplex algorithm? (Dantzig’s 
1975 National Medal of Science specifically cites him as the 
inventor of linear programming.) Dantzig was simply passed 
over by the Nobel awards committee, an act of stunning omis- 
sion. Even the winners alluded to this, with both mentioning 
Dantzig in their Nobel speeches. In fact, when George Stigler 
won the same prize seven years later, in 1982, he too men- 
tioned Dantzig. (Stigler’s prize was not specifically for his diet 
problem work, but rather for unrelated analyses in market be- 
havior and public regulation theory.) 

Some Nobel observers feel the Nobel awards committee ig- 
nored Dantzig because he is a mathematician, while Kantoro- 
vich and Koopmans made their marks as economists. That ar- 
gument is somewhat supported by a statement made by the 
economist Robert Dorfman in his paper “The Discovery of 
Linear Programming” (Annals of the History of Computing, July 
1984, pp. 283-95): “Linear programming is not a branch of 
mathematics. It lies in the domains of economics (both ap- 
plied and theoretical) and management.” Dorfman’s paper is, 
in many respects, an admirable piece of historical writing; in 
particular, he nicely explains the motivations behind the work 
of Kantorovich and Koopmans. But his erroneous “not math- 
ematics” claim did not pass unnoticed. Vigorous letters of re- 
buttal from pioneers in the use of linear programming were 
received and printed in the Annals (see vol. 11, no. 2, 1989, 
pp. 145-51). And even before Dorfman’s paper appeared, a 
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counterexample to his assertion had appeared in a physics jour- 
nal, showing how to apply linear programming to solve cer- 
tain difficult electrical circuits problems: J. N. Boyd and P. N. 
Raychowdhury, “Linear Programming Applied to a Simple Cir- 
cuit” (American Journal of Physics, May 1980, pp. 376-78). The 
authors cite Dantzig’s work, not Koopmans’ or Kantrovich’s. 

The snub argument is, however, somewhat weakened by 
noting that Kantorovich’s doctorate was in mathematics, and 
that the 1994 economics prize was awarded (in part) to mathe- 
matician John Nash (of “A Beautiful Mind” fame) for his purely 
mathematical work in game theory. Perhaps, in fact, Nash’s 
prize was partially motivated by a desire on the part of the No- 
bel awards committee to show there is no bias by economists 
against mathematicians. Still, wouldn’t it be more direct for 
economists to simply honor the man who invented an al- 
gorithm used millions of time each day around the world— 
mostly by economists? It should be no surprise to learn that 
the top prize among mathematicians is not the economics No- 
bel, but rather the Fields Medal (often called the “Nobel prize 
of mathematics”). 


As a fabulously successful algorithm, the simplex algorithm had 
long been viewed as the gold standard. It had also long been sus- 
pected that it is not perfect in the computationally efficient sense. 
What that means is this: define the size S of a linear programming 
problem to be the sum of the number of variables and constraint 
conditions. For example, the chicken diet problem discussed earlier 
has a size of S = 2 (variables) +3 (constraint conditions) = 5. Now, 
a computationally efficient algorithm (for any problem) is one that 
requires, at most, a solution time that increases as some polynomial 
function of S. The simplex algorithm has been shown, however, in 
its worst caSe, to require an exponential solution time. Such worst- 
case problems, I should mention, do not actually seem to occur very 
often in “real life,” and the simplex algorithm almost always works 
quite well on problems with sizes up to S = 20,000 or more. This 
is because, despite being theoretically exponential, it nearly always 
performs (i.e., converges to a solution) in a time approximately pro- 
portional to S!. This (desirable) average behavior caught Dantzig’s 
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early attention. In a long interview given some years ago (College 
Mathematics Journal, September 1986, pp. 293-314), he said of the 
simplex algorithm: 


Most of the time it solved problems with m equations in 2m or 
3m steps—that was truly amazing. I certainly did not anticipate 
that it would turn out to be so terrific. I had no experience at 
the time with problems in higher dimensions, and I didn’t trust 
my geometrical intuition. For example, my intuition told me that 
the procedure would require too many steps wandering from one 
adjacent vertex to the next. In practice it takes few steps. In brief, 
one’s intuition in higher dimensional space is not worth a damn! 


Still, linear programming problems with sizes much larger than 
20,000 are becoming increasingly common (tens of thousands of 
constraints and hundreds of thousands of variables occur in routine 
“industrial strength” problems), and so the search started decades 
ago for alternatives to the simplex algorithm, for algorithms that 
would always execute in polynomial time. In 1979, the Soviet math- 
ematician Leonid Khachiyan (now on the computer science faculty 
at Rutgers University) announced the final step to earlier work (by 
others) that resulted in what is called the ellipsoid algorithm, which 
always runs in polynomial time (in a time proportional to 5°). But 
since the simplex algorithm exhibits an S! behavior nearly all the 
time anyway, the 1979 result was mostly of academic interest only. 
That interest was intense, however, and a wonderfully funny and 
insightful essay on how even usually serious people went “off the 
deep end” about the ellipsoid algorithm is by Eugene L. Lawler, 
“The Great Mathematical Sputnik of 1979” (Mathematical Intelli- 
gencer, 1980, pp. 191-98). 

The nonpolynomial time property of the simplex algorithm 
wasn’t proven until 1972, but when it was it was by the most con- 
vincing type of mathematical proof there is—the production of 
specific examples. That year the American mathematicians Victor 
Klee (1925- ) and George Minty (1929-86) published the follow- 
ing class of n-variable (pick n to be any integer greater than zero) 
linear programming problems: 


maximize f = 2”~!x, +2"-*x9 +--- + 2x, 1 + xn subject to the 
n constraints 
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xX] <5 
4x, + x2 < 25 
8x, + 4x2 + x3 < 125 


2" x + jie or ae + 4Xn-} + Xn < a 


There are 2” vertices on the resulting feasible solution set and, start- 
ing at x) = x2 = --- = x, = 0 (which obviously both satisfies the 
constraints and is a vertex), Klee and Minty showed the simplex al- 
gorithm would find the vertex that maximizes f—(0, 0,---,5”)—as 
the last vertex. As a simple example of how polynomial and expo- 
nential times compare, consider the following table of 2° and S°: 


S 28 5° 
2 4 64 
5 32 15,625 
10 1,024 1,000,000 
20 1,048,576 64,000,000 
29 536,870,912 594,823,321 
30 1,073,741,824 729,000,000 


The exponential time algorithm is actually faster (25 < 5°) than the 
polynomial time algorithm for S < 29. Of course, for S > 30, we 
would obviously prefer the polynomial time algorithm. S$ = 30 is, 
in fact, actually a pretty small linear programming problem. 

In 1984, the Indian analyst Narendra Karmarkar (then at AT&T 
Bell Laboratories and not yet 30 years old) announced an entirely 
different polynomial time algorithm, one that runs in a time pro- 
portional to S*>. (It is interesting to note that he was not trained as 
a mathematician, but rather as an electrical engineer.) His algorithm 
seems to possess the simplex algorithm’s property of nearly always 
performing better than its absolute worst-case limit when applied 
to problems of everyday structure. For problems with sizes much 
larger than 20,000, it appears to run from 50 to 100 times faster than 
does the simplex algorithm. The Karmarkar algorithm is called an 
interior algorithm because, unlike the simplex algorithm, the search 
for the extreme vertex starts from inside the feasible solution set. 
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The simplex algorithm, by contrast, remains entirely on the hyper- 
dimensional surface of that set as it moves from vertex to vertex. 

Karmarkar’s algorithm was initially viewed with great skepticism, 
not only because of the dramatic speed claims for its performance, 
but also because AT&T refused to reveal important details until after 
it received a patent on the algorithm (in 1988). There was precedent 
for this—the first U.S. software patent had been granted years before, 
in 1968. Many computer science observers, however, don’t believe 
patenting software is the best way for computer science research 
to develop. Dantzig openly published his simplex algorithm, and 
from that public accessibility came enormous productive research 
and useful knowledge. But, of course, we see today a parallel legal 
path being taken in the biotechnology fields, with companies at- 
tempting to patent the DNA codes of everything from microbes to 
humans, the very “algorithms of life”! Today, the detailed theory be- 
hind the Karmarkar algorithm is readily available: you can find it, 
the ellipsoid algorithm, and the simplex algorithm, all discussed in 
solid mathematics in the single book by the mathematician Howard 
Karloff, Linear Programming (Birkhauser 1991). A very nice discus- 
sion of Karmarkar’s algorithm, with some interesting applications 
of it, had already appeared a few years before in the journal liter- 
ature; see Gilbert Strang, “Karmarkar’s Algorithm and Its Place in 
Applied Mathematics” (The Mathematical Intelligencer, vol. 9, no. 2, 
1987, pp. 4-10). 


7.5 Minimizing by Working Backwards 
(Dynamic Programming) 


In this final section of the book I'll discuss an important mathe- 
matical development that occurred almost at the same time as did 
the start of linear programming. When a problem can be formu- 
lated as a time-ordered sequence of decisions, then the solution 
(what those decisions should be to achieve the extreme of some 
function) can often be found with dynamic programming. The de- 
velopment and proselytizing of this mathematical theory for solv- 
ing multistage decision processes is most closely associated with 
the American mathematician Richard Bellman (1920-84). His 1957 
book Dynamic Programming (Princeton) is recognized today as a 
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classic and, although now almost a half-century old, it is still a 
source of fascinating problems. 

Dynamic programming is much more of an “art form” than is 
linear programming, and in that sense resembles classical analysis 
much more than does linear programming. Indeed, the simplex 
algorithm is available in a number of different (huge) standardized 
computer programs into which one need only enter the objective 
function and constraint inequalities and out comes the answer. In 
dynamic programming, by contrast, we must develop a new analysis 
for each new problem, which generally takes the form of deriving a 
functional equation characteristic of the particular problem. Here’s a 
simple but instructive example of that process for solving a problem, 
which will also illustrate the use of Bellman’s famous principle of 
optimality: 


If P = x) x2-+-X,, with x; > Ofori = 1,2,---,n,andif >", x; = 
a, a given constant, then what values for the x; maximize P? 


It should be obvious that, for n > 2, none of the x; can be either O 
or a, as then P = 0, which is clearly not the largest P possible. 

Whatever the answer is, it is certainly the case that it could only 
be a function (at most) of n—the dimension of the problem—and of 
a, aS those are the only parameters in the problem. So, let’s write 
the maximum value of P as f,,(a). Now, for the trivial case of n = 1, 
i.e., for 


1 
P= Xi;27 > 0, ) Xi a =X, 
i=l 


we obviously have f(a) = a. For the somewhat more interesting 
case of n = 2, i.e., for 


2 
P = x4X2, x; > 0, x2 > 0, ) Xj =a=X,;4+X2, 
i=l 


we have x2 = a—x, and so P = x;(a—x;), where 0 < x; <a. Itis easy 
to see that to maximize P we should pick x; = 5a (and so x2 = 5a, 
as well). This gives f2(a) = za. How to proceed for the cases of 
n > 3, to find f3(a), f4(a), etc., is perhaps not so immediately clear, 
however. Consider the following approach. 
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Imagine that we have somehow found the value for x;. We are 
left, then, with the problem of maximizing the product x2x3--- x, 
subject to the constraints 


But that is exactly our original problem, except that we have reduced 
the problem dimension by 1 (from nx; ton—1x;), and that the n—1x; 
sum to a — x; (not to a). Thus, by the very definition of f,, we can 
write the maximum value of the product x2x3---x, as f,-;(a — x1). 
So, to find the maximum value of P for the original problem, we 
pick x; to maximize x, f,-1(a — x;), i.e., we have the multiplicative 
recurrence 


fn(a) = simax {x1 fn-1(a — x)}, 


which is the characteristic functional equation for this problem. 

Observe, carefully, how we got this functional equation. We ar- 
gued that the proper choices for the n-dimensional problem (the 
values for x}, x2,---,X,), what Bellman called the optimal policy, are 
such that the values of x2, x3,---, x, form the optimal policy for the 
(n — 1)-dimensional problem. This is Bellman’s principle of optimal- 
ity for a multistage decision process, which he stated as follows in 
his Dynamic Programming: “An optimal policy has the property that 
whatever the initial state and initial decision are [in our case, here, 
that is the value of x,], the remaining decisions [that is, the values 
Of x2, x3,--++,x,] must constitute an optimal policy with regard to 
the state resulting from the first decision.” 

Once we have the functional equation for a problem, the second 
phase of a dynamic programming analysis is to solve (either analyti- 
cally or, more usually, numerically) the functional equation. For our 
problem at hand, here’s how to do that analytically. (I’ll show you 
a numerical solution in my next example.) Let’s return to the case 
of n = 3, the first case for which we found the problem getting less 
easy to handle. We have, from the functional equation, that 
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f3(a) = max {x, fo(a — x})}. 
O<x) <a 
But since f)(a) = fa’, then fo(a — x1) = 4(a — x))’, and so 
I 2 
f3(a) = max 4x;-(a—x;)"7. 
O<x) <a 4 


We can easily find the appropriate value of x, by simply setting the 


derivative of 4x) (a — x,)* equal to zero. This gives x; = ja, and so 


3 
fla) = a 


If we summarize our results so far, we have 


It would seem to be obvious that a good guess to the general an- 


swetr is 
a\* 
fx(a) = (7) , 


and this is, indeed, easy to confirm by induction. We certainly know 
the conjecture is true for k = 1, 2, and 3. So, let’s assume the conjec- 
ture is true for k = n — 1 and see if that implies it is true for k =n. 
Thus, 


n—1 
fra) = pmax {X1 fn-1(a — X1)} = max {* (¢ — =) | 


<X| <a n— l 


Setting the derivative of x;((a — x,)/(n — 1))"7! equal to zero gives 
x; =a/n, and so 


316 CHAPTER 7 


a n—-1| 
) n—1 n—} 
_a n _a| na—a _a a(n — 1) 
Ina) = > n—1 cent Read 


which confirms the conjecture. Thus, to maximize P, set x; = x2 = 
“++ = X, = a/n, which gives the maximum value of P = x;x2---Xx, 


as fn(a) = (a/n)”. 


In slightly less formal language, the principle of optimality 
is the mathematical version of what your parents always told 
you is the way to live an honorable life—“always do your best.” 
That is, you’ll make the most of what you started with if you 
always make the most of what you have left. Bellman was very 
much taken with the principle of optimality and, in his fas- 
cinating, eccentric autobiography Eye of the Hurricane (World 
Scientific), published the same year as his death (as befits a true 
autobiography), he wrote: “My first task in dynamic program- 
ming was to put it on a rigorous basis. I found that I was us- 
ing the same technique over and over to derive a functional 
equation. I decided to call this technique, ‘The principle of 
optimality.’ ” 

When a friend objected, saying, “The principle is not rigor- 
ous,” Bellman wrote that he replied “ ‘Of course not. It’s not 
even precise.’ A good principle should guide the intuition.” 
As you might gather from this, Bellman was a character! He 
was a brilliant (even though his greatest admirers also thought 
him supremely arrogant) if somewhat erratic genius, and his 
job title at the time of his death shows the broad range of his 
interests: he held a professorship at the University of South- 
ern California with joint appointments in mathematics, elec- 
trical engineering, biomedical engineering, and medicine. His 
breadth of mathematical accomplishment is illustrated by the 
recognition he received in 1979 from the world’s largest en- 
gineering professional society, the Institute of Electrical and 
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Electronics Engineers (IEEE). That year Bellman received the 
IEEE Medal of Honor (the most prestigious award of all in elec- 
trical engineering) for his work in dynamic programming. 


We are now in a position to understand how to use the principle 
of optimality to solve the directed-graph minimum-total-cost path 
problem from section 7.2; i.e., look back at figure 7.11. Let’s write 
c(i, j) as the cost of traveling the arc that links node i to node j 
(where you'll recall that our convention in numbering the nodes of 
a directed graph is that i < /j). Also, let’s write f(i) as the minimum 
total cost in traveling from node i to the terminal node (node 8 in 
figure 7.11). Obviously, then, f(8) = 0. 

Now, suppose that we have somehow arrived at node 7. There is 
only one way to travel to node 8, at a cost of c(7, 8) = 4. So, next 
to node 7 let’s write a 4 inside of a box, as shown in figure 7.15. 
Similarly, if we have somehow arrived at either node 5 or node 6, 
then, in each case, there is only one way to get to node 8 (at the 
costs of c(5, 8) = 1 and c(6, 8) = 1, respectively). So, next to nodes 
5 and 6 we write a 1 inside of a box. The numbers in the boxes 
represent the total cost to travel from a box’s node to node 8, i.e., 
fC) =4, (6) =1, FO) = 1. 

Continuing to work our way backwards toward node 1 (it is, after 
all, the value of f(1) and the path that achieve that minimum total 
cost that we are after), suppose next that somehow we have arrived 
at node 4. From there we now have more than one way to proceed; 
we could go to either node 5 or to node 6. If we go to node 5, the total 
cost to travel to node 8 is c(4, 5) + f(5) = 4+ 1 =5, and if we go to 
node 6, the total cost to travel to node 8 is c(4, 6)+ f(6) = 141 =2. 
Since f (4) is the minimum total cost to travel from node 4 to node 
8, we obviously have 


Fea) = min et Hoot = min {> | = 


c(4, 6) + f (6) ps 


So, next to node 4 we write 2 in a box. What we’ve done so far is 
shown in figure 7.15, which also shows a slanted bar struck through 
each arc that is traveled in going from one node to the next (this is 
equivalent to dropping rice behind us, so we can find our way back). 
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FIGURE 7.15. Solving a directed graph by working backwards. 


The general process should now be clear. If we have somehow 
managed to arrive at node ij, then the minimum total cost of travel- 
ing from that node to node 8 is given by 


f= mle J+ f£Q)}, 


and f (i) is the number we write in a box next to node i. This additive 
recurrence is the dynamic programming functional equation for the 
least-cost path through a directed graph. It clearly incorporates the 
principle of optimality because, no matter how we may have gotten 
to a particular node, the path onward from that node to the terminal 
node is an optimal policy itself. If we continue to use the functional 
equation, we find that 


— fc8,5)+ f5)) [S41], 
FQ) =min oto =min {> f= 
fc D+F). [644] _ 
F0) = mine tot A merifa? 
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c(1, 3) + f(3) 2+6 
fC) =min § cl, 2) + f2) } =min} 447} =8, 
c(1,4) + f(4) 7+2 


and the final result is shown in figure 7.16. The minimum cost path 
isl! > 3 — 5 > 8, and the cost of that path is 8. It should be evident 
by now that the functional equation for this problem would be very 
easy to code for automatic execution on a computer and, given the 
connection topology of even a very large (say, n = 1,000 nodes) 
directed graph, the least-cost path could be found quickly. 

There is one curious aspect to our solution—we found the op- 
timal policy in the reverse order from which it would be actually 
implemented. That is, we worked backwards from node 8 to node 1 
to find the least-cost path from node 1 to node 8. Since our conven- 
tion for numbering the nodes means that we will encounter ever 
increasing node numbers as we move forward in time during our 
journey from node 1 to node 8, then increasing node numbers are a 
measure of increasing time. So, our numerical solution of the func- 
tional equation is a backwards-in-time process. This sort of thing is a 


[| 


FiGurE 7.16. Final dynamic programming solution to the directed graph of 
figure 7.15. 
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general property of dynamic programming solutions, and it brings 
to mind an observation made more than 150 years ago by the Dan- 
ish philosopher S@ren Kierkegaard: “You can only understand life 
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backwards, but we must live it forwards.” 


Our solution by dynamic programming of the directed graph 
problem is a pretty mathematical result, all by itself, but it has im- 
mediate practical applications, too. Consider, for example, the fol- 
lowing problem faced by a manufacturing firm that sells a certain 
large, expensive earth-moving machine to construction engineers. 
Suppose we are given the following information about the firm’s 


business practices and orders: 


1. 


the construction of a machine takes one month; 


2. if any machines are built during a month, then there is a 


fixed overhead production cost (independent of the number 
of machines constructed) of 2 units of money charged to the 
earth-moving budget (defining money this way keeps the 
numbers in the problem from becoming awkwardly large, 
e.g., define a unit to be $1,000); 


. the firm can construct any number of machines during any 


particular month (including none, with the production 
facility devoted that month to other manufacturing duties, 
and the overhead production cost charged against other 
budgets); 


. completed machines are shipped to customers only at the end 


of a month; 


. if a completed machine is not shipped at the end of a month, 


then it is stored on-site (as inventory) at a cost of 1 unit of 
money per machine per month (the month of construction 
does not incur a storage cost); 


. the production line for earth-moving machines is shut down 


during the winter months of December and January, and is 
available for production only during the other ten months of 
the year; the firm plans its operation in production cycles of 
5-months duration, i.e., there are two production cycles per 
12-month period. 


. at the start of each production cycle the inventory is zero; 
. the firm’s marketing department has contracts for 16 


machines, to be delivered according to the following schedule 
for the first production cycle: 
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Number of machines 


At the end of to be shipped 
February 2 
March 4 
April 2 
May 5 
June 3 


The firm’s production manager needs to determine the construc- 
tion schedule (i.e., policy) that has the minimum total overhead/ 
storage cost. That is, he needs to calculate how many machines 
should be constructed each month. One possible (extreme) policy, 
for example, would be to simply construct all 16 machines in Febru- 
ary. The cost of doing that can be easily calculated as follows: 


Machines Machines 
Stored Delivered at 
Machines Constructed during the End of 
during the Month of This Month This Month Cost 

February 16 0 2 2 
March 0 14 4 14 
April 0 10 2 10 
May 0 8 ) 8 
June 0 3 3 3 


The total overhead/storage cost for this particular policy is then 37 
units of money. But is this the policy with the minimum total cost? 
The answer is no and, in fact, the optimal policy has a significantly 
lower cost. 

To find the least-cost policy, we can represent the manager’s prob- 
lem as finding the least-cost path through a complete directed graph. 
To see how this is done, first observe that the following two conclu- 
sions can be made from the above list of the firm’s business practices: 


1. during any given month the firm should construct just 
enough machines to fill orders for an integer (including zero) 
number of months, because to do otherwise would result in 
excess machines that will incur storage costs while waiting for 
the next delivery date; 
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2. conclusion #1 implies the firm should construct new 
machines only when the inventory has shrunk to zero and, 
because of business practice #3, this causes no problems. 


Now, let node i in a directed graph represent the firm at the start of 
month i, where i = | => February, i = 2 => March, and so on, to 
i = 6 => July. (Notice that the end of June, when machines can last 
be shipped, is the start of July.) This gives us the graph of figure 7.17, 
where c(i, j) is the cost of the arc joining node i to node /. That is, 
c(i, j) is the cost of, at the start of month i (with zero inventory), 
constructing (and perhaps storing) enough machines to make all 
deliveries to the start of month j. We have already, for example, 
calculated that c(1, 6) = 37. Similar calculations result in the costs 
shown in the figure (as shown in section 7.2, with n = 6 nodes there 
are $(6)(5) = 15 arcs in this graph). 

Applying the dynamic programming procedure for directed 
graphs to the graph of figure 7.17 gives the result shown in figure 
7.18, which shows that there are actually two production schedule 


©) 


FIGURE 7.17. The directed graph of the machine construction problem. 
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FiGurE 7.18. Dynamic programming solution of figure 7.17. 


policies that result in the same least cost of 10 units of money: 
11> 2->-3-—->4->5-6and!1_—>~2—->4-—5 - 6. The optimal 
(least-cost) policy is not unique. The first policy says to construct 
2 machines in February, 4 in March, 2 in April, 5 in May, and 3 
in June, i.e., to construct the machines to be shipped at the end of 
each month during that month. The second policy says to construct 
2 machines in February, 6 in March, none in April, 5 in May, and 
3 in June. The choice between these two policies would have to be 
made on issues other than the cost, e.g., perhaps having the pro- 
duction facility available for another project during April would tip 
the decision toward the second policy. 


A Question for You to Play With 


Suppose the fixed overhead cost is changed to 4 units of 
money. The storage cost remains at 1 unit of money per machine 
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per month. Now what is the optimal production policy? The 
answer is at the end of this section. 


As the final example of dynamic programming, let’s return to 
the 1,000-machine integer programming problem discussed in sec- 
tion 7.4. I left it unsolved there because the number of integer- 
valued variables (8) means the feasible solution set is a collection of 
points in hyperspace (which, as Dantzig observed, is a hard thing to 
“visualize”!) The dynamic programming formulation has no such 
complications. Recall that we are to determine how many of the 
still-operational machines to allocate, at the beginning of each week 
over a four-week period, to making part A (with the remaining oper- 
ational machines assigned to making part B). As with the previous 
dynamic programming examples, we’ll solve this problem “back- 
wards in time.” 

To start, let’s define f,(n) as the maximum profit that can be made 
during a period of k weeks that starts with n operational machines. 
So, if we assign x machines to make part A (and thus n — x machines 
to make part B), then we can immediately write, for a one-week 
period (k = 1), 


fi(n) = max {400x + 600(n — x)} = max (600n — 200x) 


SAS 


= 600n, 


because we obviously achieve the maximum of 600n — 200x by 
setting x = 0. This result tells us that, at the start of week 4 (when we 
have just one week left), we should assign all of the then-operational 
machines to making to part B. 

Let’s now back up to the start of week 3, i.e., we are now con- 
cerned with /,(n), the maximum profit to be made from n opera- 
tional machines with two weeks to go. Since /f2(n) is the sum of the 
profits from week 3 and week 4, then if we start week 3 by assigning 
x machines to making part A (and so nm — x machines to making part 
B), we can write 


fo(n) = max {{600n — 200x] + f\[0.8x + 0.6(n — x)]}. 


ba, Gos 
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Since f;(n) = 600n, then 
fa(n) = max {600n — 200x + 600[0.8x + 0.6(n — x)]} 


= max {960n — 80x} = 960n, 
as we Clearly maximize 960n — 80x by setting x = 0. So, as we start 
week 3, we should assign all of the operational machines to making 
part B. 

Let’s now back up to the start of week 2, i.e., we are now con- 
cerned with /;(n), the maximum profit to be made from n opera- 
tional machines with three weeks to go. Since /f3(n) is the sum of 
the profits from week 2 and the final two weeks, then if we start 
week 2 by assigning x machines to making part A (and son — x 
machines to making part B), we can write 


f3(n) = max {{600n — 200x] + fo[0.8x + 0.6(n — x)]}. 


<x 


Since f2(n) = 960n, then 


f3(n) = max {600n — 200x + 960[0.8x + 0.6(n — x)]} 


O<x< 


= max {1,176n — 8x} = 1,176n, 
O<x<n 
as we Clearly maximize 1,176n — 8x by setting x = 0. So, as we start 
week 2, we should assign all of the operational machines to making 
part B. 

Finally, let’s back up to the start of week 1, i.e., we are now con- 
cerned with f4(n) (where of course now n = 1,000 for our particular 
problem). As usual, let’s assign x machines to making part A and 
n — x machines to making part B. Then, as before, 


fa(n) = max {[600n — 200x] + f3[0.8x + 0.6(n — x)]}, 


O<x< 


or, aS {3(n) = 1,176n, we have 


fa(n) = max {600n — 200x + 1,176[0.8x + 0.6(n — x)]} 


<x< 


= max {1,305.6n + 35.2x} = 1340.8n, 


O<x<n 
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as we Clearly maximize 1,305.6n + 35.2x by setting x =n. So, as we 
start week 1, we should assign all of the operational machines to 
making part A. 

Our optimal (maximum four-week profit) policy is, thus, 


start of week 1: assign all 1,000 machines to making part A; 

Start of week 2: assign all remaining machines (= 800) to 
making part B; 

Start of week 3: assign all remaining machines (= 480) to 
making part B; 

start of week 4: assign all remaining machines (= 288) to 
making part B. 


This policy will generate a total profit of $1,340,800, and all other 
policies would generate /ess profit. 

AS a final comment on this problem, its solution by dynamic pro- 
gramming is in one important sense more general than is a solution 
by linear or integer programming; dynamic programming will still 
work even if the objective function is nonlinear. For example, sup- 
pose we keep all as before except for the profit that is generated 
from making each part. Suppose now that, instead of varying lin- 
early with the amount of each part made (i.e., linearly with the 
number of machines assigned to make each part), the profit from 
each part varies quadratically with the number of machines assigned. 
This might occur, for example, from the reduced cost per part expe- 
rienced by purchasing in large quantity the raw material required to 
make the parts. The problem of determining the optimal (maximum 
profit) policy is now one of nonlinear programming and the linear 
simplex algorithm will not work. Dynamic programming, however, 
doesn’t miss a beat. 

To see this, suppose that if n machines are assigned to make part 
A for a week, then the profit is 2n*, while if those n machines are 
assigned to make part B for a week, then the profit is 3n’. Defining 
fi(n) as before, if we have n operational machines at the start of 
week 4 and we assign x of them to making part A (and n — x to 
making part B), then we can write 


fi(n) = amax {2x? + 3(n — x)*} = max (5x? — 6nx + 3n*) 


= 3n?, 
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because 5x” — 6nx + 3n” achieves its maximum value in the interval 
0 < x <n atx = O. This is because the quadratic expression 
has a minimum at an x within the interval (at x = 2n), and so the 
maximum must occur at one of the endpoint values for x; which one 
is easy to determine. For x = n the quadratic is 2n*, and for x = 0 
the quadratic is 3n”. So, the conclusion is that x = 0. This result says 
that at the start of week 4 we should assign all operational machines 
to making part B, just as we concluded in the linear profit case. 

Let’s now back up to the start of week 3. Then, if we assign x 
machines out of n to making part A (and n — x to making part B), 
we have 


fo(n) = max {5x? — 6nx + 3n? + fi {0.8x + 0.6(n — x)]} 


= max {5x7 —6nx + 3n7 + f,(0.6n + 0.2x)} 


O<x<n 


= max {5x? — 6nx + 3n? + 3(0.6n + 0.2x)*} 


O<x <n 
as f\(n) = 3n?. So, 


fo(n) = max {5x? — 6nx + 3n? + 1.08n* + 0.72nx + 0.12x7} 


O0<x< 


= max {5.12x? — 5.28nx + 4.08n*} = 4.08n7 


O<x<n 


when x = 0. Thus, as we start week 3, we should assign all opera- 
tional machines to making part B, just as we concluded in the linear 
profit case. 

Let’s now back up to the start of week 2. Then, as before, if we 
assign x out of n machines to making part A and n — x to making 
part B, we have 


f3(n) = max [Sx* — 6nx + 3n? + f[0.6n + 0.2x]} 


<x 


= max {5x* — 6nx + 3n? + 4.08(0.6n + 0.2x)7} 


O<x<n 


= max {5.1632x* — 5.0208nx + 4.4688n°} 


O<x<n 


— 4.6112n? 
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when x = n. So, as we start week 2, we should assign all operational 
machines to making part A, which is not what the optimal policy 
says to do in the linear profit case. 

And finally, let’s back up to the start of week 1. Then, 


fa(n) = max {5x? — 6nx + 3n* + f3[0.6n + 0.2x]} 


<x< 


= max {5x? — 6nx + 3n*? + 4.61 12(0.6n + 0.2x)7} 


O<x<n 


= max {5.184448x* — 4.893312nx + 4.660032n°} 


O<x<n 
— 4.951168n2 


when x = n. So, as we start week 1, we should assign all operational 
machines to making part A, just as we concluded in the linear profit 
case. The quadratic profit function has caused the decision at the 
start of week 2 in the optimal policy to switch from what it is in the 
linear profit case. 

The next complication we might introduce in these calculations 
is to recognize that some of the given conditions of the problem 
are unrealistic. For example, why would exactly 20% (40%) of the 
machines making part A (part B) during each week break each and 
every week? On average this could perhaps make sense, but from 
week to week to week a better model would specify probability den- 
sity functions for the breakdown percentages. But then the optimal 
policy solution would also have a probability density function and 
what could that mean? One possible answer is to say that the opti- 
mal policy is optimal on the average, i.e., if the multistage decision 
process is one that is repeated over and over many times, then the 
average cost is minimized by that policy. For a process that is to be 
carried out only once, however, this definition of optimality has no 
meaning. What do we do then? There is an answer to that question, 
too, but you are not going to find it here. 

In the spirit of this book’s title, more is not always better. And so, 
at last, this is finally the end. 


Solution to the Challenge Problem 


The directed graph for the modified earth-moving machine 
problem (with the fixed overhead cost changed to 4 units of 
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(continued) 

money and the storage cost remaining at 1 unit of money 
per machine per month) is shown in figure 7.19, along with 
the dynamic programming results. The optimal policy is 1 — 
2 —> 4 — 6, with a cost of 17, i.e., the optimal production 
policy (now unique) is to construct 2 machines in February, 6 
in March, none in April, 8 in May, and none in June. 


FiGuRE 7.19. Dynamic programming solution to the challenge 
problem. 


A final note: | opened this book with a number of quotes to indi- 
cate the importance of studying extrema. Let me close with one last 
quote, an (unintentionally hilarious) illustration of the elusive na- 
ture of extrema among even highly educated professionals. In a re- 
cent (2002) report on the need to sensitize doctors to the benefits of 
eliminating avoidable pain during medical procedures, the authors 
concluded that “Optimal pain control should be the minimum ac- 
ceptable standard.” No one would disagree in spirit with the noble 
nature of this goal, but I’m afraid it is a priori an impossible goal. 
After reading this book, you should know why. 


Appendix A. 
The AM-GM Inequality 


If x1,%2,+++,X, are any n nonnegative numbers, n > 1, and if 
A = (1/n)(x; + x2 +---+x,) is the arithmetic mean of the x’s, 
and if G = (x,x2---x,)'/" is the geometric mean of the x’s, then 
A > G with equality iff x) = x2 =--- = Xp. 


This was known to Euclid for the n = 2 case. The first proof, for 
arbitrary n, is due to the Scottish mathematician Colin Maclaurin 
(1698-1746), who published it in 1729. 


ProoFr. Suppose we have any n positive numbers (if one or more of 
the x; are zero, then the inequality is trivially obvious, and so we'll 
suppose all the x; > 0) whose product is 1, i.e., 


X1X2°°+Xy, = 1. 


This may seem at odds with the above statement that the x’s can 
be any n positive numbers, because the product may then not be 1. 
Suppose, in fact, the product is P. In that case we divide through 
both sides by P and replace each x; with 


Xj 
yi = Pin 
Then we do have 
YiV2°°+ Vy = 1. 


Pll continue on at this point with the y’s and, at the end, simply 
replace each y; with x;/P'/". You’ll see soon how nicely things then 
turn out. 
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Now, it is clear that either all of the y; are equal (and so all are 
equal to 1) or that they are not all equal. If they are all equal (to 1), 
it is equally clear that then 


Wit yetes + yn =n. 


If they are not all equal, then the claim is that their sum is at least 
equa! to n, i.e., 


Mi eee 


with equality iff y) = yo = --- = y,. Thus, our claim is that 
this last inequality is true whether all the y; are equal or not (as 
obviously n > n). I’ll establish this result in the next paragraph but, 
assuming for now this is so, you can see how the AM-GM inequality 
immediately follows. That’s because if we divide through by n, we 
have 


Vi yr yn 
n 


> 1=ypyo--+ yn = 1" = (yyo-+ yn). 


Then, replacing each y; with x;/P!/" as explained above, 


9 


xi baa teste | (xixasssan\" _ Gira an)!" 
nPi/n =: P = Pi/n 


or, as P now conveniently vanishes from the inequality, we have 


es Cae" 
n 
with equality iff xj = x7 = --- = x,. So, all we have to do is show 


the truth of our assumption, i.e., that, indeed, y; + yo +---+y, >n. 

We’ve already seen that the case of all the y; equal (to 1) is trivial, 
sO now we'll treat the case where all the y; are not equal. Can all 
the y; be greater than 1? No, as then their product would be greater 
than 1. Can all of the y; be less than 1? Again, no, as then their 
product would be less than 1. So, at least one y; must be greater 
than 1 (label it y;) and at least one y; must be less than 1 (label it 
y2). Thus, 1 — y; < 0 and 1 — y2 > O, and so 


(1 — yi) — y2) < 0, 
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Or 


l— yi — yo + yi y2 < O, 


Or 


1+ yiy2 < yi t+ yo. 


What are we going to do with this? We'll use it to complete an 
induction proof, i.e., we’ll assume that our claim is true forn = k 
(that yjy2--- ye, = 1 means y; + y2+---+y, > k) and then show that 
the claim must be true for n = k + 1. Since the claim is obviously 
true for n = 1, then the claim would be true for n = 2, and so on; it 
would be true for all n > 1. 

So, by assumption we have y; y2--- yx = land yj +y2+---+y, = k, 
i.e., if any k positive numbers have a product of one then their sum 
is at least k. Our concern now is with k + 1 positive numbers whose 
product is 1—what can we say about their sum? I’ write these k + 1 
numbers as y;, 1 < i < k+1, to indicate that they are not necessarily 
the & y; numbers. The only important consideration is that now we 
have k + 1 numbers. So, we have 


YiVo-+ + VeMea = I, 


and we consider the sum y; + yo +--- + ye + Yea1. BY the same 
argument above it must be true that 


1+yiy2< y+ ye, 
and so 
Yi Yosh eee Ye Vek = 1 yiyee ya es ee isc 
But, yj) yo + y3+---+ye41 is the sum of k (not k+ 1) positive numbers 


whose product is y; y2y3--- ¥e+1 = 1 and we already know that sum 
is at least k. So, 


Yt yote s+ M+ Va Zl tk 


and we are done. 
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The AM-QM Inequality, 


and Jensen’s Inequality 


If x1, X2,°++,X, are any n real numbers, then 
Mi txetettin fap tag te ty 
n a n 
with equality iff xj = x2 =-+-- = Xp. 


Proor. Squaring the arithmetic mean, we have 


(= teeny 
n 


x? + x3 +-+--+x? +all possible x;x; cross-products with i # j 
= a 
We can write this more compactly (and more clearly, as well, I think) 
as 


me ; 4 
(etsy i=] = if 


n n2 


n 


iXj 


n 
2 
y=l X 
J 


Now, since the square of a real number is never negative, we have 


2 2 2 
0 < (x; — x;) SAS 2X; Xs 
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and so 2x;x; <x? + x? with equality iff x; = x;. Thus, 


n 2 n 1 
ee: ppl td (x? +x?) 
i=l < j=l 2 iLFJ 

n = n2 


This may look somewhat cryptic, but the right-hand side becomes 
easy to visualize if you sketch the n x n array (or square matrix) of 
values as shown in the table below, where the value of the element 
at (i, j) is x? +x?. The first summation, )°7_, (x7 +x?), is the sum of 
the entries along the main diagonal, while the second summation 


Deiat Dia (x? + x?), is the sum of all the off-diagonal entries. 


1 2 3 4 
fd [ale [Weg 


wm. 


Thus, the sum of the two summations is simply the sum of all of 
the terms in the array, i.e., our inequality becomes 


n 2 J]Jnin 


Sin) 5d (3? +57) 


i=1 j=l 


n = n2 
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The table clearly shows that each x’ term, for a given subscript, 
appears a total of 2n times in the array, and so 


n 2 l n 
S> x; ~2n)>> x . 
i=] = 2 i=l ] 2 
= Xe 
n — n2 n — 
with equality iff x} = x. = --- = x,. The right-hand side is the 


quadratic mean (QM). Taking the square root, we have 


with equality iff x; = x. = --- = x,, and we are done. The right- 
hand side of this inequality is often called the rms value of the x’s 
(rms is the abbreviation for “root-mean-square”) as it is the square 
root of the mean of the squares. 

The AM-QM inequality is actually just a special case of a far 
more general result called Jensen’s inequality (see section 2.2 for who 
Jensen was). To understand Jensen’s result, let’s start by consider- 
ing the graph of the function f(x) = x? in figure B1, which is of 
course an upward-opening parabola. If we take any two values of x, 
say x; and x2, then it is geometrically clear that the chord joining 
the two points on the parabola (x), x?) and (x, x) lies above the 
section of the parabola cut off by the chord. This property identifies 
f (x) = x? as what mathematicians call a strictly convex function. If, 
on the other hand, f(x) is a function whose graph opens downward 
(e.g., f(x) = sin(x) for 0 < x < z), then the chord joining any two 
points on f(x) lies below the curve and f(x) = sin(x) is said to be 
strictly concave over the interval 0 <x <1z. 

For such functions, Jensen’s inequality says: 


if f(x) is strictly convex (strictly concave) on some interval, and if 
X1,X2,°°+,X, aren values of x from that interval, and ifc), c,---, 
Cy are n positive constants such that cj + c2 +---+c, = 1, then 


f (> cn] >) er f@i) 
| i=] 


IA IV 
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FIGURE B1. A parabola is a strictly convex function. 
with equality iff x; = x2 = --- = x,, where we use < if f(x) is 
convex, and we use > if f(x) is concave. 


For example, it is geometrically clear that f(x) = x? is cey 

convex on the entire real line and so, if we pick c} = cp. = --- = 

C= 1 pe n, then Jensen’s inequality becomes for any set of n aaimber 
x} 9 X2,° 9 Xn) 


Or 


OI 


xp +x2 +++ +Xp x? 4x3 4---4x2 
n n 


with equality iff x; = x2 =--- = x,, which is the AM-QM inequality. 
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As another example, if f(@) = sin(@), which is strictly concave in 
the interval 0 < 6 < z, andif we pick c; = cz = --- =c, = 1/n, then 
Jensen’s inequality becomes for any set of n numbers in the interval 
Otoz, 


sin (2 ) > : Y sin(6;), 


with equality iff 6; = 0. = --- = 6,, a result used in chapter 2 to 
show that the maximum area N-gon inscribed in a given circle is a 
regular N-gon. 

And finally, as you probably suspect by now, we can derive the 
AM-GM inequality (which started all of our discussion of inequal- 
ities) as also simply a special case of Jensen’s inequality. If we pick 
f(x) = —In(@), a function that is strictly convex over the nonneg- 
ative real axis (f(x) is complex for x < 0), then Jensen’s inequality 
becomes 


—In ( ox] aes 2 c; In(x;) for x; > 0, 
r=] i=] 


with equality iff x; = x2 =--- = x,. That is, 


In (cyxy + €2xg H+ + CnXn) = In (xt) + In (x82) +e + In (x2) 


= In (5 .. <<) ; 


Thus, 
CyX1 +C2X2 +++ Hb enXn Sx KS xe", 

and so, if we pick c; = c2 = --- = cy, = 1/n, then, with all the x; > 0, 
we have 

Na AD a ee, - 

Seay 

n 

with equality iff x; = x2 = --- = x,, which is the AM-GM inequality. 


Clearly, Jensen’s inequality is a very powerful result and deserves 
to be better known than it is, at least among engineers and scientists, 
especially considering that Jensen himself was an engineer! It is also 
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not difficult to prove; I’ll do it here for the case of f(x) strictly con- 
vex, and the strictly concave version will then follow immediately. 
That is so because if f(x) is strictly concave, then — f(x) is strictly 
convex and, after multiplying through the inequality for — f(x) by 
minus one, all that happens is that the sense of the inequality is 
reversed. 


Proof (by induction) of Jensen’s Inequality. 


Step 1. The inequality is true for the case of n = 2 and f(x) convex 
because it says 


f(ceixX; + 2x2) < cy fi (x1) too f(x), cto =1, 


with equality iff x; = x2. That is, if we drop the subscripts and simply 
write c for c; and 1 —c for cz, the inequality says (for n = 2) 


flexi + —e)x2] < cfQi)+U—c)f@2), O<c<l. 


But this is just what is meant geometrically by saying f(x) is strictly 
convex; the left-hand side is the height of the plot of f at an arbi- 
trary value of x between x = x; and x = x» (x varies from x; to x2 as 
c varies from 1 to 0), while the right-hand side is the height of the 
straight-line chord joining the two points (x;, f(x;)) and (x2, f(x2)), 
which varies linearly between f(x,) and f(x2) as c varies between 1 
and 0. That is, for n = 2, the inequality simply says that the chord 
lies above the graph of the function for any x such that x; < x < Xp. 

To show the iff condition, first suppose x; = x2. Then, the in- 
equality says 


flexi; + —c)x\] <cf(x)) +0 —c) f(x). 


Thus, as both sides reduce to f(x;), we have equality. To go in the 
opposite direction, now suppose that 


f [ex; + U1 — c)x2] = cf(x41) + (A — c) f(x2), O<c<l. 


But this says the function and the chord are equal at all points be- 
tween x; and x2 which is clearly impossible for a strictly convex f 
unless x; = x2. This shows the iff condition for the n = 2 case. 

Step 2. We now assume that the inequality is true for n = k and 
show that this implies it is true for n = k+ 1. That is, we take as true 
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k k k 
f (>: vai) 3 d, of F(x) with > c; = 1, c; > 0 
a i=l 


and ask what we can say about 


k+l] k+1 
f ) C; Xj with ) C; = 1, C; > 0? 


Notice that I am not assuming the first k values of the k + 1c; are the 
kc;; only that whatever the ¢c; are they satisfy the conditions of all 
being positive, and summing to 1. 

Now, we can write 


k+l k P 
Ci Xi | = Lie xig te 
f (: C s] f i k+1 pa = — | Ck4I su 


From Step 1, we have, by the definition of strict convexity of f, that 


f{d —ocutecv] <cfi(y +d -afW), 


and so 


k+I k P 
(y Ci s] <anfouw+0~A4n S(O s) 


—¢ 
a k+l 


Now, since ¢;/(1 — ¢x11) > O for all i (because c; > 0 for all i, and 
Cr41 < 1 because the ¢; sum to 1), and since 


aA aA 


k ‘ 
Ci Cj C2 Ck 

) SSS 8S eS Se Ee Se 

ley lege lee 1 = Cyt 

Cp tCote:- +O,  1—Ckat 


—a) OF 
1 — Chi T= Cay 


then from the assumed truth of the n = k case, we have 


k P k P 
(oo l= Cyt «) oe Tan 1 — Ces Fay 
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But this says 


A 


k+1 k 
i (>: é «| < Gest f tees) + 1 deat) 0) 4 FQ) 
t=! rt 


I — Cert 


k 
= Char f 41) + > Ci f (xi) 


i=] 


k+l 


= > Ci f (x;). 
= 


That is, the truth of the inequality for the n = k+1 case follows from 
the assumption the inequality holds for the n = k case. And, since 
the inequality does hold for the n = 2 case, it holds for all n > 2 as 
well, and we are done. I’ let you fill in the remaining iff arguments. 

As a final note, Jensen’s inequality (1906) was actually derived 
earlier (1889) by the German mathematician Otto Holder (1859- 
1937), but in a formal, nongeometric context, i.e., Hdlder’s initial 
assumption was simply that the second derivative of f(x) exist and 
be nonnegative. The geometric interpretation of a convex function is 
Jensen’s (as is the term convex). Another famous inequality is named 
after Hdlder (1884), but it is not the one studied here. 


Appendix C. 
“The Sagacity of the 
Bees” (the preface to 
Book 3S of Pappus’ 
Mathematical Collection) 


Though God has given to men, most excellent Megethion, the best 
and most perfect understanding of wisdom and mathematics, He 
has allotted a partial share to some of the unreasoning creatures 
as well. To men, as being endowed with reason, He granted that 
they should do everything in the light of reason and demonstra- 
tion, but to the other unreasoning creatures He gave only this gift, 
that each of them should, in accordance with a certain natural 
forethought, obtain so much as is needful for supporting life. This 
instinct may be observed to exist in many other species of crea- 
tures, but it is specially marked among bees. Their good order and 
their obedience to the queens who rule in their commonwealths 
are truly admirable, but much more admirable still is their emu- 
lation, their cleanliness in the gathering of honey, and the fore- 
thought and domestic care they give to its protection. Believing 
themselves, no doubt, to be entrusted with the task of bringing 
from the gods to the more cultured part of mankind a share of am- 
brosia in this form, they do not think it proper to pour it carelessly 
into earth or wood or any other unseemly and irregular material, 
but, collecting the fairest parts of the sweetest flowers growing on 
the earth, from them they prepare for the reception of the honey 
the vessels called honeycombs, [with cells] all equal, similar and 
adjacent, and hexagonal in form. 
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That they have contrived this in accordance with a certain ge- 
ometrical forethought we may thus infer. They would necessarily 
think that the figures must all be adjacent one to another and have 
their sides common, in order that nothing else might fall into the 
interstices and so defile their work. Now there are only three rec- 
tilineal figures which would satisfy the condition, I mean regular 
figures which are equilateral and equiangular, inasmuch as irreg- 
ular figures would be displeasing to the bees. For equilateral tri- 
angles and squares and hexagons can lie adjacent to one another 
and have their sides in common without irregular interstices. For 
the space about the same point can be filled by six equilateral tri- 
angles and six angles, of which each is 5 [of a] right angle, or by 
four squares and four right angles, or by three hexagons and three 
angles of a hexagon, of which each is 14 [of a] right angle. But 
three pentagons would not suffice to fill the space about the same 
point, and four would be more than sufficient; for three angles 
of the pentagon are less than four right angles (inasmuch as each 
angle is 14 [of a] right angle), and four angles are greater than four 
right angles. Nor can three heptagons be placed about the same 
point so as to have their sides adjacent to each other; for three 
angles of a heptagon are greater than four right angles (inasmuch 
as each is 13 [of a] right angle). And the same argument can be 
applied even more to polygons with a greater number of angles. 
There being, then, three figures capable by themselves of filling 
up the space around the same point, the triangle, the square and 
the hexagon, the bees in their wisdom chose for their work that 
which has the most angles, perceiving that it would hold more 
honey than either of the two others. 

Bees, then, know just this fact which is useful to them, that the 
hexagon is greater than the square and the triangle and will hold 
more honey for the same expenditure of material in constructing 
each. But we, claiming a greater share in wisdom than the bees, 
will investigate a somewhat wider problem, namely that, of all 
equilateral and equiangular plane figures having an equal perimeter, 
that which has the greater number of angles is always greater, and the 
greatest of them all is the circle having its perimeter equal to them. 


Pappus’ ancient words motivated mathematicians many centuries 
later, by then in possession of the calculus, to analytically study the 
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“best” way to make a honeycomb. Two such mathematicians were 
the German Johann Samuel Konig (1712-57), who published his 
analysis (with some errors) in 1740, and then later (1755) Ruggero 
Boscovich (1711-87). You can find Boscovich’s work described in 
the paper by R. M. Dimitric: “Using Less Calculus in Teaching Cal- 
culus: An Historical Approach” (Mathematics Magazine, June 2001, 
pp. 201-11). For what a modern mathematician has to say on just 
how well the bees actually do, see L. Fejes Téth, “What the Bees 
Know and What They Do Not Know” (Bulletin of the American Math- 
ematical Society, 1964, pp. 468-81). Téth concludes that they do 
pretty well! As he wrote, “We must admit that all this [i.e., T6th’s 
construction of a honeycomb structure just slightly more efficient 
than the structures real bees actually build] has no practical conse- 
quence. By building such cells [TOth’s cells] the bees would save per 
cell less than 0.35% of the area of an opening... under such condi- 
tions the above ‘saving’ is quite illusory. Besides, the building style of 
bees is definitely simpler. . . so we would fail in shaking someone’s 
conviction that the bees have a deep geometrical intuition.” 


Appendix D. 
Every Convex Figure Has 


a Perimeter Bisector 


Let gy denote a convex figure, and Q a point not inside or on the 
boundary edge of g, as shown in figure D1. Let a line be drawn 
through Q, with a denoting the angle that line makes with the x- 
axis. In the figure I’ve assumed that ¢g is positioned so that it lies 
entirely in the first quadrant, above the positive x-axis and to the 
right of the positive y-axis. To make things really easy to visualize 


FigurE D1. For some a, the line through Q bisects the perimeter of ¢. 
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and explain, I have also assumed that ¢ is positioned so that the 
left- and bottommost points of g are to the right of and above 
Q, respectively, as shown in figure D1. None of these assumptions 
limits the generality of our eventual result. Finally, let P(@) denote 
the fraction of y’s perimeter below the line we drew through Q. Thus 
0 < P(a) < 1 with P(0°) = 0 with P(90°) = 1. 

It is geometrically clear that P(q@) is a smoothly increasing function 
of increasing a, i.e., P(a) is what mathematicians call a continuous 
function (there are no sudden, discontinuous jumps in the value of 
P(a@) as a increases from 0° to 90°). So, in particular, there must be 
some a = a where P(a@) = s, i.e., at angle a = a, the line through 
Q bisects the perimeter of gy. Notice that this is an existence proof; 
it tells us only, for a given g and Q, that there is a line at some angle 
a that bisects the perimeter of gy, but it does not tell us what @ is. It 
should also now be clear, since we could locate Q in infinitely many 
places, that there is not just one perimeter bisector for g but, in fact, 
there is an infinity of perimeter bisectors. 


Appendix E. 
The Gravitational 
Free-Fall Descent Time 


along a Circle 


The exact analysis of Galileo’s problem, that of determining the 
descent time of a bead, due to gravity, constrained to a vertical 
circular path of radius L, is a classic in the marriage of physics 
and calculus. In figure E1, a bead of mass m is constrained to move 
along a vertical circular wire arc that threads through a hole in the 
bead. It is assumed that friction can be ignored. The initial angle 
the radius to the bead makes with the vertical radius is a, and the 
instantaneous angle, as the bead slides along the wire is 6 (also 
measured with respect to the vertical radius). That is, 9(¢ = 0) =a. 
If the bead arrives at the bottom of the wire at time t = 7, then of 
course 6(t = T) = 0. Our problem here is to calculate T. 

At time ¢ let the distance along the circular path from the bead to 
the bottom of the wire be s, and let v(t) be the speed of the bead. 
Then, since we are ignoring friction, we can set the bead’s change in 
kinetic energy of motion equal to the change in its potential energy 
of position. We assume that the bead starts its fall from rest, i.e., 
that v(0) = 0. Then, at time ¢, and writing g for the acceleration of 
gravity, we have 


smu = mg{L cos(6) — Lcos(a)}. 


Also, since s = L@ and as v = ds/dt, then 
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dé 
v= L — 
dt 
and, thus, 
1 do \? 
5 i (=) = gL {cos(@) — cos(a)}. 


We can now take great advantage of Leibniz’s differential nota- 
tion, treat the differential dt just as an algebraic quantity, and solve 
for it. To start, write 


- eae = {cos(@) — cos(a)}. 
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Since 6 decreases as t (time) increases (because the bead is sliding 
downward) we know the change in 6 has algebraic sign opposite to 
that of the change in ¢. That is, d6 and dt have opposite signs, and 
so we use the negative sign with the square root and write 


'B do 
dt = —.| — - —_____., 
22 cos(@) — cos(@) 


For the complete descent, we have ¢ going from O to T as 6 goes 
from a to O, and so, integrating, 


EE je | ee. 
22 Jy /Jcos(@) — cos(a)’ 


Or 


2 mal a ae 
~ Vi 29 Jo =/cos(6) — cos(a) 


From the trigonometric half-angle identities, 


1 
cos(@) = 1 — 2sin? (59) 


1 
cos(a) = 1 — 2sin? (5«) 


Ge) 


and so 
2 sin” 


| =| do 
2V 8 Jo 1 1 \. 
sin? | —a ) — sin? { —@ 
2 2 


Since @ is a constant, then so is sin(5a), which I’ll now write as 
simply k. Thus, 
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L.. ee. Pe dé fi 
Pisses -|/ ——— k =sin{ ~@ }. 
2V 8 Jo ,(1 2 

k? — sin (58) 


Next, if we make the change of variable from 6@ to £, defining 6 


to be such that 
fil fil 
sin | —6 sin | —0 
2 2 


sin(B) = _ /T\ = —— 
sin | —a@ 
2 
then as 6 varies from a@ to O, we have sin(8) varying from 1 to 0, i.e., B 
varies from 90° (= 2/2 radians) to 0. Differentiating the relationship 
I 


sin(50) =k sin(8) with respect to 6, using the chain rule, we have 


; COs (5°) =k cost py. 


or, solving for dé, 


ae 2k ct 
Cos (5°) 
2 


Next, we use sin(46) = k sin(B) to write (since sin? + cos” = 1) 


Cos (5°) = ,/1 — k? sin?(B). 


dp. 


Also, since 


cos(B) = \/1 — sin?(B) = 


then 
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l 
2k - ; pke — sin? J) — sin? 
d@ = a7 yes (G8) 


= p= 


Inserting this into the integral for T (and modifying the limits to fit 
the new variable of integration, 6) we arrive at our final answer: 


ah dB (; 
oy Ee ee k =sin| -a@ }. 
& 0 \/ 1 — k2 sin?(B) 2 


None of the math in this appendix had been invented yet in 
Galileo’s time, and so his approach to studying the descent time 
along a circular path was by the different, approximate method dis- 
cussed in section 6.1. And even after the integral for T had been 
derived (by 1700), nobody knew how to evaluate it. Not even the 
genius of Euler or Newton could see how to do it. All of the attempts 
to express the integral in terms of the then-known elementary func- 
tions (e.g., logarithms, powers, exponentials, trigonometric func- 
tions) failed. It wasn’t until more than 150 years after Galileo’s 
death, with the work of the French mathematician Andrien Marie 
Legendre (1752-1833), that it was appreciated that the failure is due 
to impossibility; the integral for T represents an entirely new func- 
tion! The integral, called the complete elliptic integral of the first kind, 
has been numerically evaluated for numerous values of a and can 
be found in many mathematical tables. For example, if a = 90° (the 
bead descends along a full one-quarter arc of a circle), then 


L 
T = 1.8541 /-. 
§ 


Appendix F. 
The Area Enclosed by a 
Closed Curve 


Imagine that the closed, non-self-intersecting curve C shown in 
figure F1 is described by the parametric equations 


x=x(t) 
= y(t), 
where the parameter ¢ denotes time. That is, at time t¢ = O we 


imagine a point particle is at A, the location of the left vertical 
tangent to C. Then, as time increases, the particle moves according 
to the parametric equations, thereby tracing out the curve C until 
it returns to A at time t = 7T,. We also define the time t = Tz as 
the time at which the moving particle reaches B, the location of 
the right vertical tangent to C. (The two vertical tangent lines are 
called C’s lines of support.) To be very specific, let’s also assume that 
the parametric equations describe a clockwise motion of the particle. 
The claim, then, is that the area enclosed by C is given by 


1’ 
enclosed area = 5 | (yx —xy) dt, 
0 


where x = dx/dt and y = dy/dt, where I am using Newton’s dot 
notation to denote a time derivative. 

Before deriving this result, let me give you an example of its use. 
For the special case of C, a circle of unit radius centered on the origin 
(see figure F2), the parametric equations of C are 
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FiGureE F1. A closed, non-self-intersecting curve C is the path of a moving 
point. 


x(t) = —cos(t) 


y(t) = sin(t), 


with 7, = 2m and Tg = az. This describes clockwise motion, with 
A = (—1,0) and B = (1,0). Now, since x = sin(t) and y = cos(f), 
then the claim says 


l 20 eas ; l 20 1 
enclosed area = 5 [sin*(t) + cos*(t)| dt = 5 dt = — =n, 
0 0 


which is, indeed, the area of the circle. 

In this example, C actually cuts through all four quadrants of 
the xy-plane, but in the proof I’ll assume C lies entirely in the first 
quadrant (as drawn in figure F1). This assumption is strictly for 
convenience, however, and when we are done you'll see it will in 
no way weaken the result. And finally, in figure Fl, I have drawn 
C as a closed convex curve and so for each value of x there are 
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FIGURE F2. A unit circle traced out by a moving point in time interval 27. 


at most two values of y. Our result is easy to extend to concave 
curves, too, however, by dividing such a C up into sections, each of 
which has just two lines of support. For all curves but for those that 
mathematicians call pathological (i.e., diseased!) this sectioning can 
always be done in a finite number of steps. This issue of concavity is 
actually not important for us in this book, as we will use the result 
only (in chapter 6) to solve the ancient isoperimetric problem of 
determining the closed curve of given perimeter that encloses the 
greatest area. And, as shown in section 2.2, the solution must be 
convex, just as drawn in figure F1. 
So, we begin. If we write the integral 


XB 
| y(x) dx, 


we get the area between the top of C and the x-axis (because I have 
assumed the x-axis lies completely below C) if we use y(x) for the 
upper half of C. Writing 
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the integral becomes 
Tp dx 
area under top half of C = A; = | y(t) Th dt, 
0 


where the limits on the integral are changed to match the new 
integration variable t. That is, x = x4 att =O and x = xg att = Tz. 
Next, if we write the integral 


/ y(x) dx, 
XB 


we get the negative of the area (because xg > x4) between the bottom 
of C and the x-axis if we use y(x) for the lower half of C. Thus, 


Ta dx 
area under bottom half of C = Az = -| y(t) — dt. 
Ts dt 
Now, the area inside of C, i.e., the enclosed area, is simply A; — Az, 
and so 


Tp d T4 d 
area enclosed by C = | y(t) ei dt +/ y(t) add dt 
0 dt Pi dt 


T, 
a] yx dt. 
0 


This expression is the answer, as it stands, to the question of what 
area is enclosed by C. It does have the property, however, of appear- 
ing to treat x and y differently (one is differentiated and the other is 
not), despite the fact that a choice of coordinate system is arbitrary. 
We can get our answer to look symmetrical (to treat x and y the 
same) with the following last step. 

Imagine we rotate the coordinate axes (and C) counterclockwise 
by 90°, to arrive at figure F3. That is, y is replaced with x and x 
is replaced with —y. Since the enclosed area is a physical invariant 
unaffected by a particular choice of coordinate axes, our result for 
the enclosed area must be 
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FIGURE F3. Coordinate rotation of C does not affect the enclosed area. 


T4 TA 
area enclosed by C = | x(-—y) dt = -| xy dt. 
0 0 


(If we had rotated clockwise by 90°, then we would have replaced y 
with —x and x with y, which would lead to the same conclusion. 
Can you see what happens with a 180° rotation, either clockwise or 
counterclockwise? Then we get our original expression back, which, 
while not wrong, is not useful.) Thus, adding this expression to the 
original expression says 


Th Th 
twice the area enclosed by C = | yx dt — | xy dt, 
0 0 
or, at last, the symmetrical result 
1 sf’ 
area enclosed by C = 5 | (yx — xy) dt. 
0 


This result can easily be shown to be invariant under coordinate 
axes translation and/or rotation, which of course merely says area is 
(as mentioned already) a physical invariant. 
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FiGuRE F4. The curve C is traced out by the end of a rotating radius vector. 


This important result is so important that seeing an alternative 
derivation is not a waste of time. It is, in fact, a result that is often 
not derived even once in first treatments of the calculus of variations, 
with authors usually writing something like “see any advanced cal- 
culus text for a derivation.” I don’t like that approach, and so let’s 
do it here again. 

We begin anew with the curve C expressed in polar coordinates 
this time, i.e., a general point on C is located by drawing the radius 
vector from the origin to that point, at angle 6 with length r, as 
shown in figure F4. (Without loss of generality, we imagine the 
origin of our coordinate system is inside, i.e., is surrounded by) C. 
Then, as 6 varies through a total change of 27 radians, the varying 
length r of the radius vector causes the tip of the radius vector to 
trace out C in a counterclockwise sense. 

Since the little “triangle” swept over by the radius vector, through 
a tiny angular change of A@, has a base of rA@ and a height of r, 
then its area is AA = 5 r*A@. As we let AO > 0, we have AA > dA, 
and so the total area enclosed by C is 
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1 20 
a= faa=> [ r> dé. 
2 Jo 


Now, in terms of rectangular coordinates, we have the familiar rela- 
tions 


x =rcos(@) 


y=rsin(@), 


and so the total differentials dx and dy are, in terms of the partial 
derivatives, 


9 9 
yey ae = d@ = cos(6) dr —rsin(6) dé 


or 
dy oy 
dy = a dr + os dé = sin(@) dr +rcos(@) dé. 


From these expressions we can now write 


x(dy) — y(dx) = [rcos(@)] [sin(@)dr + r cos(6)dé] 


— [r sin(@)] [cos(@)dr — r sin(@)d6], 


which, after expansion and the obvious simplifications, reduces to 
just r°dé. 

So, if 9 = 0 at time t = 0 and if 06 = 27 at time t = T,, we have 
(the symbol /. means the integral is completely around the curve C) 


T, 
=5 [ way- ydx)= -; | 1» ne S| at 


I os 
=—-= | (yx — xy) dt. 
2 Jo 


This is just what we got in the first derivation, except for the 
sign. Remember, however, that now we are moving around C in 
the counterclockwise sense, opposite to the sense of travel in the 
first derivation. So all is, indeed, consistent. Two very different 
approaches, with the same result, which should add confidence in 
our minds that we have a correct result. 


Appendix G. 
Beltrami’s Identity 


If we multiply through the Euler-Lagrange equation of section 6.4 


by y’, we get 
a) a , ad [{ OF i , _ ay 
ay > dx\ay) 7° ae 


where, in general, F = F{x, y(x), y'(x)}. Now, the change, dF, in F 
as we allow each of the three explicit variables (x, y, y’) to change is 
given in terms of the partial derivatives of F by 


OF OF  , OF 
dF = — dy + — dy + — dx. 
dy dy’ Ox 


Or, dividing through by dx, we have 


dF dy oF dy oF dx OF 


dx dx dy 


dx dy" dx ax 
, OF a OF i" OF 
jy Oye Oe 
assuming y is twice differentiable with respect to x. That is, 


NW 


7 dy’ 7 d*y 
dx dx” 


Substituting this expression for y’(0F/dy) into the first equation 
above gives 


dF | OF OF a) 


dx > dy ax ° dx \ay’ 
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If we now notice that 


At this point we simply have an alternative form for the Euler- 
Lagrange equation. But, if we now suppose that F has no explicit 
dependence on x, i.e., if we suppose that 


OF 
—-0, 
Ox 


d , OF 
—jF-y = 0 
dx dy’ 


This is immediately integrable to give 


then we can write 


, oF 
fF — y —  =constant. 


dy’ 
This is Beltrami’s identity of 1868, a partially integrated form of the 
Euler-Lagrange equation for the special (but important) case when 
F does not depend explicitly on x. This condition is satisfied in 
a number of historically important problems, and great use of the 
Beltrami identity is made in chapter 6. 


Appendix H. 
The Last Word on the 


Lost Fisherman Problem 


At the end of Chapter 1, I challenged you to find a solution path that 
is even better (shorter) than the one that gets the fisherman back to 
shore in no more than 6.9953 miles. Consider the path shown in 
figure H1, which is but a slight variation of the 6.9953-mile path 
of figure 1.12; two straight-line segments have been added to the 
beginning and the end of the circular portion. 

Once again, the fisherman rows at some arbitrary angle @ to an 
assumed straight one-mile path to shore. Then, looking back along 
the path he has just rowed, he turns through an angle of 90° — 6 
and rows a distance of sin(@),/1 + tan?(@). This puts him, as shown 
in figure H1, at an angle of 26 from the assumed direct path to shore 
(as well as once again one mile from his starting position). He then 
rows in a circular path of radius one mile until he is once again at 
angle 26 with respect to the assumed direct path to shore. That is, 
as measured from the assumed path to shore, he swings through an 
angle of 360° — 46. Finally, he then rows straight ahead along the 
tangent to the circle at the end of the circular portion of his path. 
After rowing a maximum of sin(9),/1 + tan?(6), he is sure to arrive at 
the shore, because the original (1 + 27)-mile path lies on or inside 
this path. The total distance rowed is, with 8 measured in radians, 


L(6) = V1 +tan2(6) + 2sin(@) V1 4+ tan?(@) + 2x (= = =) 


20 
= 27 —490 +[1+2sin(6)] V1 + tan2(6). 
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FIGURE H1. The lost fisherman problem, one last time: is this the shortest 
path possible? 


We could use a computer to numerically find the 6 that minimizes 
L(6) but, somewhat surprisingly, setting dL/d@ = O in this more 
complicated case now gives an analytically solvable equation! If 
you work through the algebra, you should arrive at the quadratic 
equation 


4sin?(9) + sin(@) —2 =0, 


which has the solution 
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a/33 1 


sin(@) = 3 


Thus, L(@) is minimized (do you see why the extrema is not a max- 
imum?) when 6 = 0.63487 radians (= 36.375°), at which value 
L = 6.4589 miles, an impressive 11.3% less than the original (27 + 
1) miles = 7.2832 miles calculated in chapter 1. 


Appendix I. 
Solution to the New 
Challenge Problem 


In figure I1 I’ve drawn a triangle with side lengths a, b, and c. The 
three solid lines represent the bisector lines of the vertex angles 
(dividing the vertex angles into the half-angles a, 6, and y), and 
these bisector lines meet (as stated in the hint* given in the preface 
to the paperback edition of the book) at a common point, the point 
P in the figure. From P I’ve then dropped perpendiculars (shown 
as dashed lines) to the three sides. This immediately explains where 
the three pairs of equal lengths (marked as x, y, and z) come from, 
which in turn gives us the equations 


a=z+ty, 
and 


b=x+z, 


* The proof is elementary. Suppose we call angle A the interior vertex angle formed 
by sides a and 5, angle B the interior vertex angle formed by sides b and c, and 
angle C the interior vertex angle formed by sides c and a. Then, every point on the 
bisector line of angle A is equidistant (by symmetry) from a and J, and every point 
on the bisector line of angle B is equidistant (again, by symmetry) from b and c. 
Thus, the point P where those two bisector lines cross—and they must cross since 
they are not parallel lines—is a point on both bisector lines and so is equidistant 
from a and b as well as equidistant from b and c; point P is equidistant from a, 

b, and c. That means, in particular, that P is equidistant from a and c and so P 

is a point on the bisector line of angle C. That is, all three bisector lines of the interior 
vertex angles intersect at P. 


SOLUTION TO THE NEW CHALLENGE PROBLEM 365 


and 
c=x+yy. 
While it may not be (almost certainly isn’t) obvious at this point 


why we would be interested in doing so, we can now use these three 
equations to show that 


at+b—c=(Z+y)+@+2)—-(+y) = 2z, 
and 
b+ce—-a=(x+z)+(e+ty)—(@+y) = 2x, 
and 
atc-—-b=(z+y+(e%+y)—-—(44+2) =2y. 
You’ll see, soon, how these expressions will be of great use to us. 


Now, a brief pause to establish a result we’ll need to finish our 
analysis. If u and v are any two numbers, then it is clear that 


FiGurE 11. The new challenge problem triangle. 
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(u—v)’ >0 


or, 
u? — 2uv + v? >0 
or, 
u> + 2uv +v" > 4uv 
or, 


(u+v)*? > 4uv. 


For our problem u and v are both non-negative numbers—they will 
denote the lengths of two of the sides of any triangle—and so this 
last inequality says that 


utv > 2/uv. 


This is, in fact, the most elementary possible special case of the 
AM-GM inequality, which is proven in much more generality in 
appendix A and which is used in numerous places in this book. 
Okay, back to our original problem. 

Using our first three equations, we have 


abc = (z+ y)(x+z)(x4+ y). 


But since our above AM-GM inequality tells us that z+y > 2,/zy,x+ 
z>2./xz,and x + y > 2./xy, we have 


abe > (2,/zy)(2/xz)(2./xy) = (2z)(2x)(2y). 


And then, finally, our earlier equations for 2z, 2x, and 2y complete 
the analysis: 


abc > (a+b—c)(b+c—a)(a+c-— Db), 


where a, b, and c are the lengths of the sides of any triangle. Q.E.D. 
This result is called Padoa’s inequality, after the Italian mathemati- 
cian Alessandro Padoa (1868-1937). 
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